Does planning direction affect how hard problems become?
Planning research typically goes forward only. But some problems get easier when you work backward from the goal. What makes direction matter, and can language models exploit this?
Most LLM planning research studies forward direction only — generating steps from initial state toward goal. But many planning problems exhibit an inherent directional asymmetry: generating the correct final steps leading to the goal can be much easier than generating the correct steps from the beginning. This asymmetry is driven by bottlenecks near the goal.
The canonical example: a robot navigating to a bedroom at the end of a narrow hallway. Planning backward from the bedroom, the first step is constrained by the hallway (one possible path). Planning forward from the start, possibilities fan out quickly before the hallway constraint appears. The backward direction is easier because the bottleneck constrains the search space earlier in the backward chain.
The LLM finding: planning performance correlates with the planning complexity of the problem in that direction. This means which direction is easier is problem-specific, not universal. The paper demonstrates this holds for LLM planning, not just analytical planning theory.
However, backward planning in LLMs is systematically biased — models exhibit degraded performance when asked to plan in the backward direction directly (mirroring the difficulty humans have with backward reasoning intuitively). The solution is to flip the problem: invert the goal/start, then plan forward in the flipped problem. This avoids the backward bias while exploiting the backward direction's structural advantage.
Results: Combining planning in both directions with self-verification improves overall planning success by 4–24% across three planning domains. The diversity of candidate plans (forward + backward together) exceeds either direction alone.
This connects to the insight that How should we balance parallel versus sequential compute at test time? — but here the dimension is directional rather than just parallel/sequential. Generating diverse candidates by exploring different directions is a form of parallel planning.
Inquiring lines that use this note as a source 2
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
How should we balance parallel versus sequential compute at test time?
Test-time compute can prioritize breadth (trying many approaches) or depth (refining one approach). Which strategy works better, and does the answer depend on the problem?
backward+forward as directional parallelism; generates diverse candidates via direction diversity rather than independent sampling
-
Can backward reasoning during training improve forward reasoning?
Does training models to reason backward—generating inverse questions and solutions—build internal consistency checking that transfers to forward-only inference? This explores whether backward capacity internalized during training without test-time deployment can enhance reasoning quality.
companion note: training-time backward reasoning; this note covers test-time backward planning
-
Which sentences actually steer a reasoning trace?
Can we identify which sentences in a reasoning trace have outsized influence on the final answer? Three independent methods converge on a surprising answer about planning and backtracking.
backtracking in CoT may be acting as micro-backward-planning; anchors = local direction reversals
-
Can embedding future information in training data improve planning?
This explores whether inserting lookahead tokens containing future goals into training sequences helps models learn long-range planning without changing their architecture. The question matters because it tests whether data-level changes can produce architectural-level reasoning improvements.
complementary approach to the same problem: TRELAWNEY provides goal information at training time via embedded future tokens, while backward planning provides it at inference time by reversing search direction; TRELAWNEY-trained models may internalize backward planning benefits, making explicit backward search less necessary
-
Why does parallel reasoning outperform single chain thinking?
Does dividing a fixed token budget across multiple independent reasoning paths beat spending it all on one long chain? This explores how breadth and diversity in reasoning compare to depth.
backward+forward is an instance of the parallel diversity principle: directional diversity provides structurally different candidates that independent same-direction sampling cannot reach, extending the parallel advantage beyond random seed diversity to problem-structural diversity
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Thinking Forward and Backward: Effective Backward Planning with Large Language Models
- Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1
- Chain of Thoughtlessness? An Analysis of CoT in Planning
- On the Limits of Innate Planning in Large Language Models
- Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- Divide-or-Conquer? Which Part Should You Distill Your LLM?
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
- Emergent Hierarchical Reasoning In LLMs Through Reinforcement Learning
Original note title
backward planning reduces difficulty when goal states have bottlenecks by constraining the early search space