INQUIRING LINE

What makes certain bond distributions more learnable than others?

This reads 'bond distributions' as a question about why some distributions are easier for a model to learn than others — though I should flag up front that this corpus addresses the learnability of distributions in machine learning broadly (reasoning, fine-tuning, RL), not the bond-length/bond-angle distributions of chemistry; if you meant the molecular-structure sense, the collection doesn't yet hold material on it.


This reads 'bond distributions' as a question about learnable distributions generally, since the corpus has no notes on chemical bonds but a great deal on why some distributions are learnable and others aren't. The single loudest answer running through the collection: a distribution is learnable to the degree it sits close to what the model already represents. Chain-of-thought reasoning degrades in a predictable way the moment you push it outside the training distribution — fluent on the surface, logically broken underneath Does chain-of-thought reasoning actually generalize beyond training data?. Even the length of a model's reasoning trace turns out to be a proximity signal rather than a difficulty signal: traces stretch with hardness only in-distribution and decouple entirely once you leave it Does longer reasoning actually mean harder problems?. So 'learnable' and 'near the existing distribution' keep collapsing into the same thing.

That reframes the question: not which distributions are learnable in the abstract, but how far a model can move from where it started without breaking. One note makes this almost mechanical — keeping low KL drift from the base model preserves *plasticity*, the ability to keep learning later tasks; parameter-only methods that drift hard stall out when the domain shifts, while staying close keeps the model adaptable Does staying close to the base model preserve learning ability?. Learnability here isn't a property of the target distribution alone; it's a budget you spend by moving away from your origin.

There's a sharp tension lurking in this, though. You can make a distribution *more* learnable in one place by paying for it elsewhere. Teachers that condition on the correct answer produce confident, compressed traces that students absorb easily — but that very confidence suppresses the uncertainty signals needed to generalize, so in-domain learnability is bought with out-of-distribution brittleness Does richer teacher context hurt student generalization?. Sharpening a distribution makes it crisp and imitable and simultaneously narrows what it can transfer to. The easiest-to-learn version is often the least robust one.

And some distributions resist learning no matter how you approach them. Across constrained-optimization tasks, models plateau around 55–60% satisfaction regardless of scale, architecture, or training regime — a ceiling, not a gap you can close with more data Do larger language models solve constrained optimization better?. That's the counterpoint to proximity: closeness explains a lot, but structure in the target itself can be the wall. Two things the corpus quietly adds that you might not expect: when learning *does* take, it lands in a surprisingly structured place — RL reliably updates the same sparse, near-full-rank 5–30% subnetwork across random seeds, suggesting learnability has a consistent shape rather than being arbitrary Does reinforcement learning update only a small fraction of parameters?. And what looks like a single learned answer is still just one draw from a distribution — fixing the seed makes outputs consistent without making them reliable Does setting temperature to zero actually make LLM outputs reliable?.

The thing worth taking away: learnability in this collection is relational, not intrinsic. A distribution is learnable mostly in proportion to how near it is to the model's existing one, how little plasticity you burn reaching it, and whether its internal structure has a hard ceiling — and the moves that make it easiest to learn are frequently the same moves that make it fail to generalize.


Sources 7 notes

Does chain-of-thought reasoning actually generalize beyond training data?

DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.

Does longer reasoning actually mean harder problems?

Controlled A* maze experiments show trace length correlates with difficulty only in-distribution but decouples entirely out-of-distribution. Trace length primarily reflects recall of training schemas, not adaptive computation.

Does staying close to the base model preserve learning ability?

FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.

Does richer teacher context hurt student generalization?

Teachers conditioned on correct answers and verifier output produce confident, concise traces that students inherit. This style suppresses uncertainty expression, optimizing in-domain performance while degrading generalization to out-of-distribution problems that require epistemic caution.

Do larger language models solve constrained optimization better?

Across constrained-optimization tasks, LLMs converge to ~55–60% constraint satisfaction independent of architecture, parameter count, or training regime. Reasoning models do not systematically outperform standard models, suggesting a fundamental ceiling rather than a scaling gap.

Does reinforcement learning update only a small fraction of parameters?

Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.

Does setting temperature to zero actually make LLM outputs reliable?

Fixed seeds and zero temperature replicate the same output repeatedly, but that output remains one draw from the model's probability distribution. McDonald's omega testing across 100 repetitions reveals that consistency does not equal reliability.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher re-testing claims about what makes distributions learnable. The question remains open: beyond proximity to training data, what structural and methodological factors govern learnability?

What a curated library found — and when (dated claims, not current truth):
Findings span Dec 2024–May 2026. A curated library reports:
• Chain-of-thought reasoning degrades predictably once pushed outside training distribution; fluent surface, broken logic underneath (2025-08).
• CoT trace length reflects proximity to training distribution, not problem difficulty; traces decouple from hardness at distribution boundary (2025-09).
• Low KL drift from base model preserves plasticity and continued learning; high-drift parameter methods stall under domain shift (2026-05).
• Self-distillation with confident, compressed teacher traces produces in-domain learnability but suppresses uncertainty signals needed for OOD generalization (2026-03).
• LLMs plateau at 55–60% constraint satisfaction across scale, architecture, training regime — structural ceiling, not data gap (2026-03).
• RL updates only 5–30% of parameters in sparse, consistent, full-rank subnetworks across seeds (2025-05).

Anchor papers (verify; mind their dates):
• arXiv:2508.01191 (2025-08): distribution-bounded CoT effectiveness
• arXiv:2505.11711 (2025-05): sparse RL subnetwork discovery
• arXiv:2605.12484 (2026-05): KL drift and plasticity tradeoff
• arXiv:2603.24472 (2026-03): self-distillation brittleness

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding, judge whether newer models (larger scales, architectural changes), training methods (curriculum learning, mixture-of-experts tuning), orchestration (long-context memory, multi-agent decomposition), or evaluation frameworks have since relaxed or overturned it. Separate the durable question (learnability as relational property) from perishable limits (e.g., does the 55–60% plateau still hold for recent constraint solvers?). Cite what moved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially work claiming learnability *is* intrinsic, or that large-scale scaling breaks the distribution-proximity tie.
(3) Propose 2 research questions that assume the regime may have shifted: e.g., "Does structured continual learning (e.g., memory-augmented agents) decouple learnability from base-distribution proximity?" and "Can explicit uncertainty quantification during training recover generalization lost in high-confidence distillation?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines