SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation

How much does the order of premises actually matter for reasoning?

When you rearrange the order of logical premises in a deduction task, does it change how well language models can solve it? This tests whether LLMs reason abstractly or process input sequentially.

Synthesis note · 2026-02-22 · sourced from Reasoning Logic Internal Rules
What makes chain-of-thought reasoning actually work? How do LLMs fail to know what they seem to understand? How should researchers navigate LLM reasoning research?

LLMs are surprisingly brittle to the ordering of premises in deductive reasoning tasks, despite the fact that premise order does not alter the underlying logical task. The "Premise Order Matters" paper shows that permuting premise order can cause a performance drop of over 30%.

The key finding is directional: LLMs achieve best performance when premises are presented in the same order as the context required in intermediate reasoning steps — essentially, when the prompt mirrors the ground truth proof sequence. When premises must be mentally reordered to construct the proof, accuracy drops sharply.

This brittleness reveals that LLM deductive reasoning is not operating on abstract logical relations but on sequential pattern matching through the input. The model processes premises in order and constructs intermediate representations that are order-dependent. When the order does not match the proof structure, the model must implicitly reorder — a capability it lacks or executes poorly.

The finding connects to multiple existing insights about surface-level processing:

Since Why do chain-of-thought examples fail across different conditions?, order sensitivity is not unique to premises — it extends across the entire prompt structure. Both findings suggest that LLMs process prompts as sequential narratives, not as unordered logical structures.

Since Does training data format shape reasoning strategy more than domain?, premise ordering is another format effect: the same logical content produces dramatically different performance depending on presentation format. The 30% gap is comparable to the 7.5x format effect documented in training data.

The practical implication is that anyone constructing prompts for deductive reasoning tasks should order premises to match the expected proof sequence. This is trivial for the prompt designer who knows the answer but impossible in production settings where the answer is unknown — creating a fundamental deployment challenge for LLM deductive reasoning.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 163 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

premise ordering affects deductive reasoning performance by over 30 percent