How much does pretraining quality affect the modularity of fine-tuned models?

This reads 'modularity' as the degree to which a model keeps its skills as separable, composable parts after fine-tuning — and asks whether a stronger pretraining base is what makes those parts hold together.

This reads 'modularity' as how cleanly a model's skills stay separable and composable after fine-tuning — and whether the pretrained base is what holds that structure together. The corpus points to a fairly direct answer: pretraining quality is largely what makes modularity possible in the first place, and fine-tuning is mostly editing the seams rather than building the parts. The cleanest evidence is from pruning studies showing that networks naturally implement compositional subroutines in isolated subnetworks — and crucially, that pretraining substantially increases how consistent and reliable that modular decomposition is across architectures and domains Do neural networks naturally learn modular compositional structure?. Modularity isn't installed by the fine-tuning objective; it's inherited.

That inheritance has a layered architecture. One study decouples the two phases and finds pretraining scale builds factual knowledge in the lower layers while fine-tuning scale adjusts behavioral helpfulness in the upper layers Do pretraining and fine-tuning scale independently in language models?. So the 'modules' — the stored knowledge and latent capabilities — live in territory pretraining owns, and fine-tuning operates a layer up. This is why a strong base tolerates astonishingly light fine-tuning: LIMA shows 1000 curated examples on a strong pretrained model match models trained on orders of magnitude more, because post-training activates capabilities that already exist rather than building them Can careful curation replace massive alignment datasets?. The same theme runs through reasoning — RL post-training teaches a model *when* to deploy reasoning, not *how*, because the strategies pre-exist in the base as latent activation patterns Does RL post-training create reasoning or just deploy it?.

The sharper, less obvious lesson is what happens when fine-tuning reaches *down* into the pretrained layers — that's where modularity gets damaged. Direct weight fine-tuning corrupts knowledge storage in the lower layers, while decoding-time proxy-tuning preserves pretrained knowledge far better precisely because it leaves base weights untouched and only shifts reasoning and style Can decoding-time tuning preserve knowledge better than weight fine-tuning?. RL training can also collapse the format diversity a model inherited, converging on a single dominant pretraining distribution and suppressing the alternatives — a literal reduction in the base's compositional repertoire Does RL training collapse format diversity in pretrained models?. And fine-tuning can hollow out the *connection* between modules: after fine-tuning, reasoning chains less reliably influence final answers, becoming performative rather than functional Does fine-tuning disconnect reasoning steps from final answers?.

The most modular-friendly methods all work by *not* overwriting the base. Transformer² tunes only the singular values of weight matrices to produce composable expert vectors that mix at inference without interfering with each other Can models dynamically activate expert skills at inference time?. The implication for your question: pretraining quality sets the ceiling on modularity, and fine-tuning's job is to preserve and route those modules, not rebuild them. The fine-tuning approaches that fail at modularity are the ones that try to teach genuinely new procedures by force — and they tend to just sharpen memorization instead, collapsing on out-of-distribution variants because no real modular procedure was installed Do fine-tuned language models actually learn optimization procedures?.

Sources 9 notes

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Do pretraining and fine-tuning scale independently in language models?

Emulated Fine-Tuning reveals that scaling pretraining improves factual knowledge while scaling fine-tuning improves behavioral helpfulness. This decoupling has architectural roots: pretraining enriches lower-layer knowledge storage, while fine-tuning modifies upper-layer behavior expression.

Can careful curation replace massive alignment datasets?

LIMA demonstrates that 1000 carefully curated examples fine-tuned on a strong pretrained model achieve competitive alignment performance with models trained on orders of magnitude more data, showing that post-training activates existing capabilities rather than building new ones.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Does fine-tuning disconnect reasoning steps from final answers?

Three faithfulness tests show fine-tuned models generate reasoning chains that less reliably influence final outputs. Early termination, paraphrasing, and filler substitution all produce invariant answers more often after fine-tuning, suggesting reasoning becomes performative rather than functional.

Can models dynamically activate expert skills at inference time?

Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.

Do fine-tuned language models actually learn optimization procedures?

Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a modularity researcher re-testing claims about pretraining's role in fine-tuned model compositionality. The question remains open: does pretraining quality truly SET the ceiling on modularity, or have newer methods, architectures, or training regimes since relaxed that constraint?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2025 and converge on a layered story:
• Pretraining installs modular subnetworks reliably; fine-tuning mostly routes and activates them rather than building new procedures (2023–2024).
• Fine-tuning that overwrites base weights corrupts knowledge storage and collapses format diversity; decoding-time and singular-value tuning preserve modularity better (2024–2025).
• Direct weight fine-tuning degrades chain-of-thought faithfulness independently of accuracy; reasoning becomes performative rather than functional (2024–2025).
• RL post-training converges on dominant pretraining distributions, suppressing compositional repertoire (2025).
• Transformer² achieves composable experts via singular-value tuning without interfering with base structure (2025).

Anchor papers (verify; mind their dates):
• arXiv:2301.10884 (2023-01): Evidence for structural compositionality in neural networks
• arXiv:2411.15382 (2024-11): Fine-tuning's impact on chain-of-thought reasoning
• arXiv:2501.06252 (2025-01): Transformer² self-adaptive approach
• arXiv:2504.07912 (2025-04): RL post-training and format convergence

Your task:
(1) RE-TEST EACH CONSTRAINT: For each finding above, probe whether post-2025 models, multi-task curriculum learning, test-time adaptation, or novel orchestration (memory, routing, ensemble composition) have since RELAXED or OVERTURNED it. Separate the durable claim (pretraining sets capacity for modularity) from the perishable limitation (overwriting weights destroys it). If newer methods preserve modularity even under aggressive fine-tuning, name them and cite the work.
(2) SURFACE CONTRADICTING WORK: Search the last ~6 months for papers showing modular behavior EMERGING during fine-tuning without strong pretraining, or showing pretraining quality as WEAK predictor of downstream modularity. Flag disagreements with the layered architecture picture.
(3) PROPOSE 2 RESEARCH QUESTIONS that assume the regime may have moved: (a) Can adaptive or sparse fine-tuning methods now break the pretraining ceiling on modularity? (b) Does multimodal or long-context pretraining change the layer-wise decomposition of knowledge vs. behavior?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How much does pretraining quality affect the modularity of fine-tuned models?

Sources 9 notes

Next inquiring lines