Can autoregressive models learn faithful translation to logical representations without semantic loss?

This explores whether autoregressive language models can translate natural language into formal logic (autoformalisation) while preserving the original meaning — and whether the failures we see are surface slips or something deeper in how these models reason.

This explores whether autoregressive models can faithfully turn natural language into logical representations without losing meaning along the way. The most direct answer in the corpus is sobering: models produce logic that *looks* right but *means* the wrong thing. Can large language models translate natural language to logic faithfully? finds LLMs generate syntactically valid logical expressions that are semantically incorrect, with errors clustering around scope ambiguity, quantifier precision, and predicate granularity — exactly the places where meaning lives. Intriguingly, the same work suggests models can *recognize* correct formalizations better than they can *produce* them, so the bottleneck is generation, not comprehension.

Why does generation break down? Two notes point at the underlying mechanism rather than the symptom. Do large language models reason symbolically or semantically? shows that when you strip the familiar semantic content out of a reasoning task and leave only the formal rules, performance collapses — models lean on commonsense token associations, not symbolic manipulation. Faithful logical translation demands exactly the symbolic discipline these models lack. Reinforcing this, Do language models really understand meaning or just surface frequency? finds models systematically prefer high-frequency surface phrasings over semantically equivalent rare ones, even when meaning is identical. A model tracking statistical mass rather than meaning will quietly drift toward the common reading of an ambiguous sentence rather than the logically precise one — which is precisely the "semantic loss" the question asks about.

There's a deeper, almost philosophical thread here. Do transformer models store knowledge or generate it continuously? argues transformers hold knowledge as flowing activations rather than fixed, retrievable symbols — closer to oral performance than to a written archive. Logic is the opposite: discrete, stable, composable. If knowledge in these models is fundamentally continuous and context-bound, faithful mapping onto crisp logical form may be working against the grain of the architecture itself. And Why do language models ignore information in their context? shows that strong training priors can override what's actually in the prompt — so even a correctly stated premise can get silently "corrected" toward what the model expects.

But the corpus isn't entirely pessimistic, and this is where it gets interesting. The capability may be present yet buried. Do transformers hide reasoning before producing filler tokens? finds models compute correct answers in early layers and then *overwrite* them to produce format-compliant output — the right reasoning is recoverable, just suppressed. And Can LLMs actually forecast time series better than we think? shows that separating distinct kinds of reasoning into structured workflows surfaces abilities that monolithic prompting hides. Read together, these suggest the question "can autoregressive models do this?" might be the wrong frame — the latent competence may exist, but it's lost in a single greedy left-to-right pass.

That reframing makes Can diffusion language models match autoregressive inference speed? the quiet surprise of this collection: it questions whether strict autoregression is even the right generation paradigm, hybridizing block-wise AR with parallel decoding. If faithfulness suffers because each token is committed irreversibly before the global logical structure is settled, a less strictly sequential approach — or a workflow that lets the model revise scope and quantifiers — may be where faithful formalization actually becomes reachable. The honest bottom line: today's autoregressive models fail at semantically faithful logic translation, and the corpus suggests the cause is architectural and statistical, not a bug to be patched away — but the same notes hint the latent ability is realer than the output looks.

Sources 8 notes

Can large language models translate natural language to logic faithfully?

LLMs generate well-formed logical expressions that are semantically incorrect, with errors clustering at scope ambiguity, quantifier precision, and predicate granularity. The asymmetry suggests LLMs understand formal language better than they can generate it.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Do language models really understand meaning or just surface frequency?

LLMs show consistent preference for higher-frequency surface forms over semantically equivalent rare paraphrases across math, machine translation, commonsense reasoning, and tool calling. This suggests models track statistical mass from pretraining rather than meaning-recognition as their primary mechanism.

Do transformer models store knowledge or generate it continuously?

Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Can LLMs actually forecast time series better than we think?

LLMs have stronger intrinsic forecasting ability than recognized, but only when workflows separate numerical reasoning from contextual reasoning. Monolithic prompting obscures this capability; structured decomposition surfaces it.

Can diffusion language models match autoregressive inference speed?

Discrete Diffusion Forcing breaks the speed barrier through block-wise autoregressive generation with KV cache reuse and inter-block parallel decoding. This hybrid approach recovers both the compute efficiency of AR and the parallelism advantage of diffusion.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a capability analyst re-testing constraints on faithful logical formalization in autoregressive models. The question remains: **Can autoregressive models learn faithful translation to logical representations without semantic loss?**

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026. A curated library reported:
- LLMs generate syntactically valid but semantically incorrect logical expressions; errors cluster in scope ambiguity, quantifier precision, predicate granularity (2024).
- Models recognize correct formalizations better than they produce them — generation, not comprehension, is the bottleneck (2024).
- When semantic content is stripped, leaving only formal rules, performance collapses; models rely on token associations, not symbolic manipulation (2023).
- Models systematically prefer high-frequency surface phrasings over semantically equivalent rare ones, even when meaning is identical (2026).
- Models compute correct answers in early layers, then overwrite them to produce format-compliant output (2025); hybrid non-strictly-autoregressive decoding may preserve faithfulness better (2025).

Anchor papers (verify; mind their dates):
- arXiv:2305.14825 (2023): In-Context Semantic vs. Symbolic Reasoning
- arXiv:2401.05777 (2024): Probing Structured Semantics Understanding and Generation
- arXiv:2412.04537 (2025): Hidden Computations in Chain-of-Thought Reasoning
- arXiv:2508.09192 (2025): Diffusion LLMs & Non-Strictly-AR Inference

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For every finding above, judge whether newer models (o1, o3, Claude 4, Grok-3), training methods (process reward models, outcome scaling, constitutional AI), tooling (formal-language-aware tokenizers, verifier checkpoints), orchestration (iterative refinement loops, backward chaining over logic, cached intermediate formalizations), or evaluation suites (logical-fidelity benchmarks post-2025) have since relaxed or overturned it. Separate the durable question (e.g., "do greedy left-to-right passes lose global structure?") from perishable limitations (e.g., "early-layer suppression"). Cite what resolved it plainly; flag where constraints still hold.

(2) **Surface the strongest contradicting or superseding work** from the last ~6 months that claims autoregressive models *can* faithfully formalize, or shows a workaround (hybrid generation, external verifiers, symbolic scaffolding) that bypasses the architectural bottleneck.

(3) **Propose 2 research questions** that assume the regime may have moved: one testing whether process-level transparency (intermediate formalizations, beam-search over candidate logical structures) recovers faithfulness; one exploring whether task-specific fine-tuning on formal-language pairs has overcome statistical-frequency bias.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can autoregressive models learn faithful translation to logical representations without semantic loss?

Sources 8 notes

Next inquiring lines