TOPIC

Prompts and Prompting

9 synthesis notes · 72 source papers
View as

Can optimal experimental design improve few-shot example selection?

Rather than picking examples by similarity, could actively selecting the most informative unlabeled examples—those that reduce the model's prediction uncertainty—lead to better in-context learning performance across different model sizes?

Explore related Read →

Does iterative prompt engineering undermine scientific validity?

When researchers repeatedly adjust prompts to get desired outputs, does this practice introduce hidden bias and produce unreplicable results? The question matters because LLM-based research is proliferating without clear methodological safeguards.

Explore related Read →

Does learning from mistakes improve in-context learning?

Explores whether inducing models to make errors on few-shot examples, then having them articulate principles from those mistakes, leads to better performance than learning from correct examples alone.

Explore related Read →

Why do some questions perform better without step-by-step reasoning?

Explores whether chain-of-thought prompting universally improves reasoning or if simpler prompts work better for certain questions. Understanding this matters because it challenges assumptions about how LLMs should be prompted to solve problems.

Explore related Read →

Does prompt politeness change how accurate language models are?

Earlier research suggested rude prompts hurt LLM accuracy, but newer models show the opposite pattern. This raises questions about whether tone effects are real and reliable enough to guide prompting strategies.

Explore related Read →

Can we measure prompt quality independent of model outputs?

This explores whether prompt quality has measurable, learnable dimensions beyond intuition. The research asks if prompts can be evaluated by their communicative, cognitive, and instructional properties rather than by their results.

Explore related Read →

Does model confidence predict robustness to prompt changes?

Explores whether a model's certainty about its answer determines how much it resists prompt rephrasing and semantic variation. This matters because it could explain why some tasks are harder to evaluate reliably.

Explore related Read →

Can a single transformer become universally programmable through prompts?

Explores whether prompts can function as genuine programs that unlock universal computation in fixed-size models, and whether this theoretical possibility translates to practical training outcomes.

Explore related Read →

Can reasoning steps be dynamically pruned without losing accuracy?

This explores whether chain-of-thought reasoning contains redundant steps that can be identified and removed during inference. Understanding which steps matter could improve efficiency while maintaining correctness.

Explore related Read →

Source papers 72

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.