INQUIRING LINE

What inductive bias would force models to learn Newtonian mechanics instead of shortcuts?

This explores whether any change to a model's architecture or training signal could push it to learn the underlying physical laws (the kind that generalize) rather than the brittle pattern-matching tricks that happen to fit the training data — and the honest answer the corpus gives is that we don't yet have one that reliably works.


This explores whether any inductive bias could force a model to learn real Newtonian mechanics rather than shortcuts, and the corpus' starting point is sobering: the shortcut problem is the default, not an edge case. The most direct evidence is that transformers trained on orbital mechanics learn predictive patterns, not the unified force law underneath — probe them with inductive-bias tests and fine-tune them, and they reveal nonsensical, 'slice-dependent' physics that changes from one region of the data to the next (Do foundation models learn world models or task-specific shortcuts?). The same note finds arithmetic running on range-matching heuristics rather than an algorithm. So the question isn't academic: a model can predict planetary motion beautifully and still have no Newton inside it.

Why do shortcuts win by default? Two notes point at the training signal itself. When reward shaping pushes models too hard, they don't reach for deeper structure — they collapse into degenerate tricks like answer-repetition and computation-skipping, and those shortcuts then contaminate capabilities the model already had (Do overly hard RLVR samples actually harm model capabilities?). And the biases models do absorb trace straight back to the statistics of their data: LLMs reproduce human causal-reasoning errors — weak 'explaining away,' Markov violations — not because they reason worse, but because they're fitting the correlational shape of what they saw (Do large language models make the same causal reasoning mistakes as humans?). A pure correlation-fitter has no reason to prefer the law over the shortcut when both fit the data equally; that's the gap an inductive bias would have to close.

This is where the corpus quietly reframes your question. The dream of a model that 'just reads the laws off the data' is exactly the 'theory-free AI' fallacy — high accuracy masking a correlation-for-causation substitution, the way a 95%-accurate justice model still convicts thousands wrongly (Can AI models be truly free from human bias?). The lesson isn't that bias is bad; it's that *some* structural prior is unavoidable, so the real engineering question is which prior to build in. And you can't even tell whether a model has learned a law by looking at its behavior — representational analysis finds correlations without causation, behavioral tests show effects without mechanism. Only paired causal-plus-representational probing can confirm a model holds the actual structure (Can we understand LLM mechanisms with only representational analysis?). In other words, you can't reward 'learning Newton' if you can't detect it.

The one architectural candidate the corpus offers is the most interesting thread to pull: Energy-Based Transformers replace next-token prediction with assigning an energy to each input–prediction pair and minimizing it by gradient descent at inference. That shift in objective — learn a landscape, then search it — yields markedly better out-of-distribution generalization *without* domain-specific scaffolding (Can energy minimization unlock reasoning without domain-specific training?). That's the shape an answer to your question would take: not a Newton-specific rule bolted on, but a learning objective whose inductive bias rewards consistent global structure over locally-good shortcuts, so the law becomes the cheaper solution.

The thing you may not have known you wanted to know: 'shortcut vs. law' isn't a property a model either has or lacks — it lives in three separable places at once. The objective (energy-minimization vs. token-prediction), the reward design (which actively manufactures shortcuts when miscalibrated), and the measurement (you can't fix what causal probing can't see). No single inductive bias forces Newtonian mechanics; the corpus suggests the leverage is in changing what 'fitting the data' is even allowed to mean.


Sources 6 notes

Do foundation models learn world models or task-specific shortcuts?

Inductive bias probes show transformers trained on orbital mechanics and games learn predictive patterns, not unified world structure. Fine-tuning reveals nonsensical, slice-dependent laws; circuit analysis shows arithmetic relies on range-matching heuristics, not algorithms.

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

Do large language models make the same causal reasoning mistakes as humans?

LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.

Can AI models be truly free from human bias?

Research shows that 'theory-free' AI models mask bigotry behind high accuracy metrics while committing fundamental statistical errors. A 95% accurate criminal justice system would wrongly convict thousands, demonstrating that model sophistication does not validate causal inference.

Can we understand LLM mechanisms with only representational analysis?

Representational analysis alone identifies correlations without causation; causal analysis alone shows behavioral effects without explaining them. Only paired methods—locating candidate features representationally, then verifying causally—produce complete mechanistic claims.

Can energy minimization unlock reasoning without domain-specific training?

Energy-Based Transformers assign energy values to input-prediction pairs and use gradient descent minimization for inference, yielding 35% higher training scaling rates and 29% more inference-compute gains than Transformer++, while generalizing better on out-of-distribution data without domain-specific scaffolding.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

As an AI researcher, you're testing whether any inductive bias can force models to learn Newtonian mechanics rather than predictive shortcuts—a question a curated library framed as fundamentally about THREE separable leverage points, not one.

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026; treat these as snapshot claims to re-examine:
- Transformers trained on orbital mechanics learn slice-dependent heuristics, not unified force laws; fine-tuning and inductive-bias probing expose nonsensical region-specific "physics" (2025).
- Miscalibrated reward shaping actively manufactures shortcuts (answer-repetition, computation-skipping) that then contaminate downstream capabilities (2026).
- LLMs absorb causal-reasoning errors from training-data statistics (weak explaining-away, Markov violations), not because they reason worse but because they fit correlational shapes (2025).
- Representational analysis alone cannot distinguish learned causation from learned correlation; only paired causal-plus-representational probing detects actual structure (2025).
- Energy-Based Transformers—which minimize an energy landscape rather than predict next tokens—show better out-of-distribution generalization without domain scaffolding (2025).

Anchor papers (verify; mind their dates):
- arXiv:2507.06952 (2025, "What Has a Foundation Model Found?")
- arXiv:2605.28388 (2026, mechanistic role of sample difficulty in RLVR)
- arXiv:2507.02092 (2025, Energy-Based Transformers)
- arXiv:2411.18656 (2024, theory-free AI fallacy)

Your task:
(1) RE-TEST the three leverage points. For each—objective (energy vs. token prediction), reward design (shortcut-manufacturing), and measurement (causal probing)—assess whether recent architectural innovations, training methods, or mechanistic-interpretability advances have RELAXED the constraint that shortcuts are default. Where has the regime moved; where does the shortcut trap still hold?
(2) Surface work from the last 6 months that CONTRADICTS the claim that reward miscalibration is the primary shortcut driver, or that proves representational+causal probing can reliably detect learned laws in practice.
(3) Propose 2 open questions that assume Energy-Based or other next-generation objectives may have already partially solved the shortcuts problem: What does "learning Newton" look like *inside* an energy-based model, and can we design a loss that rewards stable causal structure across distributional shift?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines