What prompt types best extract different aspects of item content?
This explores which prompting styles best surface different facets of item content for recommendation — and the corpus reframes the question: there is no universal best prompt, because what a prompt extracts depends on the model, the task structure, and even emotional framing.
This explores which prompt types best draw out different aspects of item content — and the most useful finding in the corpus is that the question has no single answer, because prompt effectiveness is conditional rather than absolute. The clearest evidence comes from a 23-prompt benchmark run across 12 models: rephrasing and background-knowledge prompts boosted cheaper models, while step-by-step reasoning actually *reduced* accuracy on high-performance ones Do prompt techniques work the same across all LLM tiers?. So 'which prompt extracts which aspect' is really a question of model tier and task shape, not a fixed recipe.
That conditionality runs deeper than model size. Instance-adaptive work shows that chain-of-thought reasoning only helps when the question's own semantics flow into the prompt structure before reasoning begins — for simple items, a direct question-to-answer path beats step-by-step decomposition Why do some questions perform better without step-by-step reasoning?. In other words, the prompt type that extracts a content aspect cleanly for a complex item can actively get in the way for a simple one. And the choice can't be made in isolation: optimizing a prompt without knowing the inference strategy (best-of-N, majority voting) systematically backfires, while jointly tuning prompt and inference together yields up to 50% gains Does prompt optimization without inference strategy fail?.
There's also a layer most people underestimate — surface phrasing carries weight independent of meaning. Semantically identical prompts produce systematically different outputs because models register the pre-training frequency of a phrasing, not its meaning, so the higher-frequency wording wins Why do semantically identical prompts produce different LLM outputs?. Even emotional framing shifts what content comes back: appending motivational phrases reliably improves performance Can emotional phrases in prompts improve language model performance?, while negative tone gets rebounded into neutral-positive answers, quietly changing the information an identical question retrieves Does emotional tone in prompts change what information LLMs provide?. If you're trying to extract a specific aspect of an item, the framing you didn't think was load-bearing may be steering the result.
For extracting *rigorous* or *structured* aspects — warrants, justifications, the 'why' behind an item — the corpus points to argument-scheme prompting: posing Toulmin-style critical questions forces the model to check its premises and catches reasoning failures that plain chain-of-thought lets slide Can structured argument prompts make LLM reasoning more rigorous?. And if you want to evaluate which prompt is doing its job, prompt quality itself decomposes into six measurable dimensions (communication, cognition, instruction, logic, hallucination, responsibility), so 'extracts content well' can be diagnosed rather than guessed Can we measure prompt quality independent of model outputs?.
The quietly useful thing to walk away with: the corpus suggests the right mental model isn't 'a menu of prompt types for content aspects' but a matching problem — pair the prompt to the model tier, the item's complexity, and the inference strategy, and watch for framing effects you didn't intend. The prompt that surfaces an aspect best is the one fitted to those three, not the one that sounds most thorough.
Sources 8 notes
A 23-prompt benchmark across 12 LLMs shows rephrasing and background-knowledge prompts boost cheap models, while step-by-step reasoning reduces accuracy in high-performance models. Task structure, not generic best practices, determines which prompts help.
Saliency analysis reveals that CoT prompting fails when question information doesn't aggregate into the prompt structure before reasoning begins. For simple questions, direct question-to-answer flow outperforms step-by-step reasoning, showing the optimal prompt depends on question type, not just task category.
Prompts optimized without knowledge of the inference strategy (best-of-N, majority voting) systematically underperform. Joint optimization of both prompt and inference strategy yields up to 50% improvement across reasoning and generation tasks.
Cao et al. and Adam's Law show that semantically identical prompts with different sentence-level frequencies produce systematically different output quality. Higher-frequency phrasings win because models register statistical mass from pre-training, not meaning.
Testing EmotionPrompt across ChatGPT, Bard, and Llama 2 showed consistent performance gains from appending psychological phrases like "This is very important to my career." The effect works through motivational framing rather than new information, with positive emotional words driving over 50% of improvements.
GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.
Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.
Research identifies six evaluable dimensions—Communication, Cognition, Instruction, Logic, Hallucination, and Responsibility—with 20 sub-criteria based on Grice, cognitive load theory, and instructional design. Improvements in one dimension cascade to others, revealing prompt quality as a structured space rather than a flat checklist.