TOPIC

Prompts and Prompting

9 synthesis notes · 72 source papers

View as

Can optimal experimental design improve few-shot example selection?

Rather than picking examples by similarity, could actively selecting the most informative unlabeled examples—those that reduce the model's prediction uncertainty—lead to better in-context learning performance across different model sizes?

Does iterative prompt engineering undermine scientific validity?

When researchers repeatedly adjust prompts to get desired outputs, does this practice introduce hidden bias and produce unreplicable results? The question matters because LLM-based research is proliferating without clear methodological safeguards.

Does learning from mistakes improve in-context learning?

Explores whether inducing models to make errors on few-shot examples, then having them articulate principles from those mistakes, leads to better performance than learning from correct examples alone.

Why do some questions perform better without step-by-step reasoning?

Explores whether chain-of-thought prompting universally improves reasoning or if simpler prompts work better for certain questions. Understanding this matters because it challenges assumptions about how LLMs should be prompted to solve problems.

Does prompt politeness change how accurate language models are?

Earlier research suggested rude prompts hurt LLM accuracy, but newer models show the opposite pattern. This raises questions about whether tone effects are real and reliable enough to guide prompting strategies.

Can we measure prompt quality independent of model outputs?

This explores whether prompt quality has measurable, learnable dimensions beyond intuition. The research asks if prompts can be evaluated by their communicative, cognitive, and instructional properties rather than by their results.

Does model confidence predict robustness to prompt changes?

Explores whether a model's certainty about its answer determines how much it resists prompt rephrasing and semantic variation. This matters because it could explain why some tasks are harder to evaluate reliably.

Can a single transformer become universally programmable through prompts?

Explores whether prompts can function as genuine programs that unlock universal computation in fixed-size models, and whether this theoretical possibility translates to practical training outcomes.

Can reasoning steps be dynamically pruned without losing accuracy?

This explores whether chain-of-thought reasoning contains redundant steps that can be identified and removed during inference. Understanding which steps matter could improve efficiency while maintaining correctness.

Source papers 72

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.

A Survey on Large Language Models for Recommendation
LikangWu , Zhi Zheng , Zhaopeng Qiu , Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, Hui Xiong, Enhong Chen University of Science and Technology of China, Career Scie…
A Survey on Prompt Tuning
Prompt tuning has emerged as a promising parameter-efficient fine-tuning (PEFT) approach that offers several advantages: (1) parameter efficiency through updating only a small group of continuous vect…
A comprehensive taxonomy of hallucinations in Large Language Models
This report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework that posits its inherent inevitability in computable LLMs, irrespect…
ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making
Large language models (LLMs) are increasingly being used as decision aids. However, users have diverse values and preferences that can affect their decision-making, which requires novel methods for LL…
Ask, and it shall be given: Turing completeness of prompting
Since the success of GPT, large language models (LLMs) have been revolutionizing machine learning and have initiated the so-called LLM prompting paradigm. In the era of LLMs, people train a single gen…
Attribute Controlled Dialogue Prompting
“Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks. However, both discrete prompting and continuous promptin…
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts
The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fillin-the-blanks problems (e.g., …
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
“The recent success in large language models (LLMs) has shown that properly prompted LLMs demonstrate emergent capabilities on complex understanding and question-answering tasks (Wei et al., 2022a). E…
Automatic Prompt Optimization with "Gradient Descent" and Beam Search
Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error eff…
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Large Language Models (LLMs) have been widely deployed in reasoning, planning, and decision-making tasks, making their trustworthiness a critical concern. The potential for intentional deception, wher…
Boosted Prompt Ensembles for Large Language Models
we propose a prompt ensembling method for large language models, which uses a small dataset to construct a set of few shot prompts that together comprise a “boosted prompt ensemble”. The few shot exam…
Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.
Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface de…
CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning
Large Language Models (LLMs) have recently achieved impressive results in complex reasoning tasks through Chain of Thought (CoT) prompting. However, most existing CoT methods rely on using the same pr…
Can AI Have a Personality? Prompt Engineering for AI Personality Simulation: A Chatbot Case Study in Gender-Affirming Voice Therapy Training
Abstract—This thesis investigates whether large language models (LLMs) can be guided to simulate a consistent personality through prompt engineering. The study explores this concept within the context…
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
We propose CoT-Self-Instruct, a synthetic data generation method that instructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on the given seed tasks, and then to generate a new synth…
CogBench: a large language model walks into a psychology lab
Large language models (LLMs) have significantly advanced the field of artificial intelligence. Yet, evaluating them comprehensively remains challenging. We argue that this is partly due to the predomi…
Complexity-Based Prompting for Multi-Step Reasoning
We study the task of prompting large-scale language models to perform multistep reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describin…
Conversational Prompt Engineering
Conversational Prompt Engineering (CPE), a user-friendly tool that helps users create personalized prompts for their specific tasks. CPE uses a chat model to briefly interact with users, helping them …
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual r…
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference
We view large language models (LLMs) as stochastic language layers in a network, where the learnable parameters are the natural language prompts at each layer. We stack two such layers, feeding the ou…
Dialogue State Tracking with a Language Model using Schema-Driven Prompting
Task-oriented conversational systems often use dialogue state tracking to represent the user’s intentions, which involves filling in values of pre-defined slots. Many approaches have been proposed, of…
Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
Recently, a boom of papers has shown extraordinary progress in zero-shot and few-shot learning with various prompt-based models. It is commonly argued that prompts help models to learn faster in the s…
Dynamic Prompting: A Unified Framework for Prompt Tuning
the efficacy of employing fixed soft prompts with a predetermined position for concatenation with inputs for all instances, irrespective of their inherent disparities, remains uncertain. Variables suc…
EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
“Large language models (LLMs) have achieved significant performance in many fields, such as reasoning, language understanding, and math problem-solving, and are regarded as an important step to artifi…
Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting
In the era of Large Language Models, we believe it is the right time to develop AI assistance for computational psychotherapy. We study the task of cognitive distortion detection and propose the Diagn…
Experimental Design for Active Transductive Inference in Large Language Models
One emergent ability of large language models (LLMs) is that query-specific examples can be included in the prompt at inference time. In this work, we use active learning for adaptive prompt design an…
Extrapolation by Association: Length Generalization Transfer in Transformers
Transformer language models have demonstrated impressive generalization capabilities in natural language domains, yet we lack a fine-grained understanding of how such generalization arises. In this pa…
From Prompt Engineering to Prompt Science With Human in the Loop
Large Language Models (LLMs), in the recent years, have become more sophisticated and capable for them to be applicable in many situations and tasks. These tasks are not limited to information extract…
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Selecting the “right” amount of information to include in a summary is a difficult task. A good summary should be detailed and entity-centric without being overly dense and hard to follow. To better u…
Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy
In this paper, we propose and empirically investigate a prompt engineering-based approach to generate proto-personas with the support of Generative AI (GenAI). Our goal is to evaluate the approach in …
Guiding Large Language Models via Directional Stimulus Prompting
“Since directly optimizing LLMs for specific tasks is either inefficient and infeasible for most users and developers, researchers resort to optimizing prompts instead. Prompt engineering approaches, …
How Many Instructions Can LLMs Follow at Once?
Production-grade LLM systems require robust adherence to dozens or even hundreds of instructions simultaneously. However, the instruction-following capabilities of LLMs at high instruction densities h…
In-Context Principle Learning from Mistakes
In-context learning (ICL, also known as fewshot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based a…
Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models
Prompt optimization methods have demonstrated significant effectiveness in aligning black-box large language models (LLMs). In parallel, inference scaling strategies such as BEST-OF-N Sampling and MAJ…
Instance-adaptive Zero-shot Chain-of-Thought Prompting
the efficacy of a singular, task-level prompt uniformly applied across the whole of instances is inherently limited since one prompt cannot be a good partner for all, a more appropriate approach shoul…
Instruction Induction: From Few Examples to Natural Language Task Descriptions
**The Instruction Paradigm** Efrat and Levy [2020] propose to learn new tasks from natural language instructions. Mishra et al. [2022] and Wang et al. [2022b] collect crowdsourcing instructions used t…
Investigating task-specific prompts and sparse autoencoders for activation monitoring
Language models can behave in unexpected and unsafe ways, and so it is valuable to monitor their outputs. Internal activations of language models encode additional information that could be useful for…
KiPT: Knowledge-injected Prompt Tuning for Event Detection
![A diagram of a war](/assets/paper-images/KiPTKnowledgeInjectedPromptTuningForEventDetection.png) Event detection aims to detect events from the text by identifying and classifying event triggers (t…
LLMs as Method Actors: A Model for Prompt Engineering and Architecture
We introduce “Method Actors” as a mental model for guiding LLM prompt engineering and prompt architecture. Under this mental model, LLMs should be thought of as actors; prompts as scripts and cues; an…
Large Language Models Are Human-level Prompt Engineers
“Prompt Engineering Prompting offers a natural and intuitive interface for humans to interact with and use generalist models such as LLMs. Due to its flexibility, prompting has been widely used as a g…
Learning To Retrieve Prompts for In-Context Learning
![A diagram of training results](/assets/paper-images/LearningToRetrievePromptsForInContextLearning.png) In-context learning is a recent paradigm in natural language understanding, where a large pre-…
Learning, Fast and Slow: Towards LLMs That Adapt Continually
Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can resul…
Leveraging Few-Shot Data Augmentation and Waterfall Prompting for Response Generation
This paper discusses our approaches for taskoriented conversational modelling using subjective knowledge, with a particular emphasis on response generation. Our methodology was shaped by an extensive …
Metacognitive Prompting Improves Understanding in Large Language Models
“While previous research primarily focuses on refining the logical progression of responses, the concept of metacognition— often defined as “thinking about thinking”—offers a unique perspective. Origi…
Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)
{okd5069@psu.edu, akhilkumar@psu.edu} Abstract The wording of natural language prompts has been shown to influence the performance of large language models (LLMs), yet the role of politeness and tone …
PRewrite: Prompt Rewriting with Reinforcement Learning
![A diagram of a task](/assets/paper-images/PRewrite-PromptRewritingwithReinforcementLearning.png) Prompt engineering is critical for the development of LLM-based applications. However, it is usually…
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts
As large language models (LLMs) have shown effectiveness with different prompting methods, such as Chain of Thought, Program of Thought, we find that these methods have formed a great complementarity …
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub “prompt-based learning”. Unlike traditional supervised learning, which trains a model to …
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Fine-tuning is the de facto way of leveraging large pretrained language models for downstream tasks. However, fine-tuning modifies all the language model parameters and therefore necessitates storing …
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but their performance is highly sensitive to the prompts utilized. This variability poses challenges for ac…
Progressive-Hint Prompting Improves Reasoning in Large Language Models
The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. …
Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem
The car wash problem asks a simple question: “I want to wash my car. The car wash is 100 meters away. Should I walk or drive?” Every major LLM tested—Claude, GPT-4, Gemini— recommended walking. The co…
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
“Prior to GPT-3, the standard approach to the evaluation and use of such models has involved fine- tuning on a portion of a task dataset [12]. GPT-3 achieved state-of-the-art performance on a wide var…
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strateg…
Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?
It has become routine to report research results where Large Language Models (LLMs) outperform average humans in a wide range of language-related tasks, and creative text writing is no exception. It s…
RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation
Retrieval-augmented generation (RAG) has shown great promise for knowledge-intensive tasks and recently advanced with agentic RAG, where language agents engage in multi-round interactions with externa…
Re3: Generating Longer Stories With Recursive Reprompting and Revision
“Of course, recent years have also witnessed a dramatic rise in the capabilities of general-purpose (non-finetuned) large pretrained language models. Of particular note are their strong zero-shot capa…
Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?
Human reasoning involves different strategies, each suited to specific problems. Prior work shows that large language model (LLMs) tend to favor a single reasoning strategy, potentially limiting their…
Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation
Large language models (LLMs) can perform recommendation tasks by taking prompts written in natural language as input. Compared to traditional methods such as collaborative filtering, LLM-based recomme…
Role play with large language models
Here we advocate two basic metaphors for LLM-based dialogue agents. First, taking a simple and intuitive view, we can see a dialogue agent as role-playing a single character. Second, taking a more nua…
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, an…
Self-Discover: Large Language Models Self-Compose Reasoning Structures
*Table 2. All 39 reasoning modules consisting of high-level cognitive heuristics for problem-solving. We adopt them from Fernando et al.* (_2023_). Reasoning Modules 1 How could I devise an experim…
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
We investigate how to elicit compositional generalization capabilities in large language models (LLMs). Compositional generalization empowers LLMs to solve complex problems by combining foundational s…
Style Vectors for Steering Generative Large Language Models
This research explores strategies for steering the output of large language models (LLMs) towards specific styles, such as sentiment, emotion, or writing style, by adding style vectors to the activati…
Systematic synthesis of design prompts for large language models in conceptual design
Conceptual design can be modeled as a proposition making process, where designers make logical propositions to communicate and construct intangible concepts. Not only can LLMs interpret designers’ pro…
Test-time Prompt Intervention
Test-time compute has led to remarkable success in the large language model (LLM) community, particularly for complex tasks, where longer chains of thought (CoTs) are generated to enhance reasoning ca…
The Prompt Report: A Systematic Survey of Prompting Techniques
I have the v2, published Dec 30 We note that the transition dynamics between states depend primarily on the verb used in the action (e.g., take, put, cook, ...). Predicting action-driven transitions…
Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models
Abstract. While large language models demonstrate impressive performance on static benchmarks, the true potential of large language models as self-learning and reasoning agents in dynamic environments…
Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models
In this study, we propose a class of compact yet effective prompts (~30 tokens in length) that synthetically fuse semantically distant concepts in ways that resist scientific integration—such as combi…
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
![A screenshot of a diagram](/assets/paper-images/UPRISE.png) ![A diagram of a process](/assets/paper-images/UPRISE2.png) Large Language Models (LLMs) are popular for their impressive abilities, but …
What Makes a Good Natural Language Prompt?
Despite the importance of understanding natural language prompts, there remains limited consensus on how to quantify them. Current approaches rely predominantly on outcome-centric measurements, such a…
When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions
Large Language Models (LLMs) have demonstrated impressive performance in code generation tasks under idealized conditions, where task descriptions are clear and precise. However, in practice task desc…