TOPIC

Training and Fine-Tuning

34 synthesis notes · 90 source papers
View as

Can utility-weighted training loss actually harm model performance?

When engineers weight loss functions to reflect real-world costs of different errors, does this improve or undermine learning? This explores whether baking asymmetric objectives into training creates unintended side effects.

Explore related Read →

Can isolating task-specific parameters prevent multi-task fine-tuning interference?

Explores whether identifying and protecting task-specific parameter regions can prevent the performance degradation that occurs when fine-tuning models on multiple tasks simultaneously. This matters because it could enable safe multi-task adaptation without sacrificing individual task performance.

Explore related Read →

Does training for compositional sensitivity hurt dense retrieval?

Dense retrieval excels at topical recall but struggles with meaning-level distinctions. Adding structure-targeted negatives during training might improve compositional sensitivity—but at what cost to overall retrieval performance?

Explore related Read →

Can semantic knowledge shift model behavior like reinforcement learning does?

Can textual descriptions of successful reasoning patterns, prepended as context, achieve the same distribution shifts that RL achieves through parameter updates? This matters because it could eliminate the need for expensive fine-tuning on limited data.

Explore related Read →

Does fine-tuning disconnect reasoning steps from final answers?

When models are fine-tuned on specific domains, do their chain-of-thought steps become less causally connected to their outputs? Three experiments test whether reasoning chains remain functionally faithful after training.

Explore related Read →

Does fine-tuning on new facts increase hallucination risk?

When LLMs learn unfamiliar facts through fine-tuning, do they become more prone to hallucinating about things they already knew? Understanding this matters for safe knowledge updates.

Explore related Read →

Does repeated sensitive data in fine-tuning cause memorization?

When language models train on the same private or proprietary data multiple times, how much do they end up memorizing and leaking that information at inference time? Understanding this risk is critical for organizations fine-tuning on confidential datasets.

Explore related Read →

How should finetuning scale with model and data size?

What scaling laws govern finetuning performance across model size, pretraining data, and finetuning data? Understanding these relationships could guide resource allocation in real-world tuning scenarios.

Explore related Read →

Can models trained on many imperfect experts outperform each one?

Do generative models trained on diverse, imperfect human experts develop an implicit consensus that surpasses any individual contributor? This explores whether aggregating diverse perspectives at training time, rather than inference time, can denoise human biases.

Explore related Read →

Can we train better models on less data?

Can gradient-based influence estimation identify which instruction data actually matters most? The research explores whether selecting small subsets of training data by their similarity to target capabilities might outperform training on everything.

Explore related Read →

Can verification separate structural near-misses from topical matches?

Should retrieval pipelines use a separate verification stage to detect structural errors that dense retrievers miss? This explores whether splitting retrieval and verification solves the compositional sensitivity problem.

Explore related Read →

Why does teacher-student information asymmetry enable learning signals?

What role does privileged answer access play in making social meta-learning training work? Without asymmetric information, can a conversation between teacher and student function as pedagogy or only as parallel speculation?

Explore related Read →

Does instruction tuning teach task understanding or output format?

Exploring whether models trained on instructions actually learn the task semantics or merely learn to match output distributions. This matters because it challenges assumptions about how fine-tuning improves model behavior.

Explore related Read →

Does staying close to the base model preserve learning ability?

Explores whether limiting how far training pushes a model from its base distribution (measured by KL divergence) helps it learn new tasks more effectively over time, and why that trade-off matters for continual learning.

Explore related Read →

Can imitating ChatGPT fool evaluators into thinking models improved?

Explores whether fine-tuning weaker models on ChatGPT outputs creates an illusion of capability gains. Investigates why human raters and automated judges fail to detect that imitation improves style but not underlying factuality or reasoning.

Explore related Read →

Can models learn multi-token concepts during fine-tuning?

Does training models to predict multiple tokens at once, rather than one token sequentially, help them form coherent semantic units? This matters because current next-token prediction fragments concepts like "ribonucleic acid" into arbitrary subword pieces.

Explore related Read →

Can post-training objectives preserve reasoning style alongside correctness?

Even mathematically sound training objectives may suppress reasoning behaviors like uncertainty expression without penalizing them. Does optimizing for answer correctness inadvertently degrade the stylistic features that enable generalization?

Explore related Read →

Does teaching question patterns before document training improve knowledge access?

Standard LLM training encodes documents first, then teaches QA patterns. But does this order matter? Exploring whether reversing the sequence—teaching how knowledge gets queried before encoding it—could unlock better factual recall.

Explore related Read →

How much poisoned training data survives safety alignment?

Explores whether adversarial contamination at 0.1% of pretraining data can persist through post-training safety measures, and which attack types prove most resilient to alignment.

Explore related Read →

Why is predicting latents more sample-efficient than tokens?

Explores whether learning from a network's own abstract representations requires far fewer training samples than learning from raw tokens, and what mechanism drives this efficiency gap.

Explore related Read →

Does procedural knowledge drive reasoning more than factual retrieval?

Explores whether models learn reasoning through general procedures across diverse documents rather than memorizing specific facts. This matters for understanding what pretraining data actually teaches models to reason.

Explore related Read →

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Explores whether applying alignment signals at inference time rather than modifying model weights can better preserve the factual knowledge learned during pretraining while still achieving alignment goals.

Explore related Read →

Can abstractions guide exploration better than depth alone?

Does training a model to propose reasoning abstractions as intermediate subgoals help it explore diverse solution strategies more effectively than simply extending chain-of-thought depth?

Explore related Read →

Can editing hidden representations beat weight updates for finetuning?

Does intervening directly on a frozen model's representations offer a better path to parameter-efficient adaptation than current weight-based methods? This challenges the dominant PEFT paradigm by treating representations as the semantic lever instead.

Explore related Read →

Does richer teacher context hurt student generalization?

When teachers are given more information during distillation, they produce confident but brittle students. Does this trade-off between in-domain wins and out-of-distribution robustness hold across different task distributions?

Explore related Read →

Do pretraining and fine-tuning scale independently in language models?

Can we decouple how model scale affects different training stages to independently improve factuality versus helpfulness? This matters for understanding whether these capabilities compete or can be optimized separately.

Explore related Read →

Does self-distillation harm mathematical reasoning performance?

Self-distillation usually improves models while shortening outputs, but mathematical reasoning shows a puzzling exception: performance drops up to 40%. What mechanism explains this counter-intuitive degradation?

Explore related Read →

Can models learn to ask clarifying questions without explicit training?

Do language models trained only on fully-specified problems spontaneously develop the ability to ask for missing information when facing underspecified tasks? This tests whether conversational problem-solving strategies emerge from meta-learning rather than direct instruction.

Explore related Read →

Can LLMs learn to ask for feedback during problem solving?

Explores whether language models can be trained to actively solicit corrective feedback mid-conversation rather than committing to single-turn answers. This matters because it could bridge the gap between fluent chat and genuine conversational learning.

Explore related Read →

Can splitting adaptation into two channels reduce forgetting?

When language models adapt to new tasks, does separating task-specific learning (via prompt context) from persistent parameter updates help preserve both generalization ability and the model's original capabilities?

Explore related Read →

Does sequencing imitation then exploration training improve reasoning?

Can combining Supervised RL (expert imitation) followed by RLVR (outcome rewards) outperform either method alone on hard reasoning tasks? This explores whether curriculum ordering unlocks capabilities neither method achieves independently.

Explore related Read →

Can step-wise expert rewards help small models learn hard reasoning?

When small models fail on hard multi-step problems, can training them to match expert reasoning steps rather than final answers provide useful learning signals? This explores whether intermediate-step alignment might overcome the limitations of both supervised fine-tuning and outcome-based reinforcement learning.

Explore related Read →

Does training on AI-generated content permanently degrade model quality?

When generative models train on outputs from previous models, do the resulting models lose rare patterns permanently? The question matters because future training data will inevitably contain synthetic content.

Explore related Read →

Why can't cosine space retrievers distinguish word order?

Dense retrievers using unit-sphere cosine spaces struggle to capture non-commutative linguistic structures like negation and role reversal. Understanding this geometric constraint explains why training fixes have limited reach in compositional retrieval.

Explore related Read →

Source papers 90

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.