Do instruction-tuned models prefer conversational over formal source language?

This explores whether instruction tuning shapes the *register* a model writes in — nudging it toward conversational phrasing rather than the formal language of the documents it was trained on or asked to draw from.

This reads the question as being about style and register, not content: does instruction tuning bias a model toward casual, conversational output over the more formal language of its source material? The corpus doesn't contain a head-to-head study measuring register preference directly — so the honest answer is that there's no single note that settles it. But several notes converge on *why* you'd expect exactly this bias, and they're more interesting together than the question alone suggests.

The load-bearing finding is that instruction tuning mostly teaches the *shape* of output, not understanding of the task. Models trained on semantically empty or even deliberately wrong instructions perform almost identically to those trained on correct ones — what actually transfers is knowledge of the output space, the format and texture of an acceptable answer Does instruction tuning teach task understanding or output format?. If register is part of that output distribution (and it is), then instruction tuning is precisely the stage where a model learns "answers sound like *this*" — and the default 'this' is the helpful-assistant conversational voice, regardless of how formal the source was.

Reinforcement tuning then sharpens that voice. RLHF rewards immediate helpfulness, which pushes models toward the friendly, accommodating, turn-taking register that scores well with human raters Why do language models respond passively instead of asking clarifying questions? Why do AI assistants get worse at longer conversations?. So there's a two-stage story: instruction tuning installs an output-format distribution, and preference tuning tilts it toward conversational helpfulness. The 'preference' in your question isn't a quirk — it's the trained objective.

The twist comes from the conversation-maintenance note: humans signal register through implicit social work — reference repair, topic hand-offs, relational moves that aren't about conveying information at all. Models don't learn these because training rewards information prediction, not relational work Why don't language models develop conversation maintenance skills?. So a model's 'conversational' register is imitated surface, not the social machinery that produces conversational tone in people — which is why it can feel conversational and oddly flat at the same time.

One more lateral pull worth knowing: when a model's prior training associations are strong, they override what's actually in the context window Why do language models ignore information in their context?. Applied to your question, that predicts a model will tend to *re-voice* formal source text into its own learned register rather than preserve the source's formality — and prompting alone often can't override it, because prompts only reorganize the existing distribution, they don't replace it Can prompt optimization teach models knowledge they lack?. If you want a deeper thread, that prior-vs-context tension is the place to dig.

Sources 6 notes

Does instruction tuning teach task understanding or output format?

Models trained on semantically empty or deliberately incorrect instructions achieve comparable performance to those trained on full correct instructions, achieving 43% vs random baseline 42.6%. The semantic content of instructions appears largely irrelevant; what transfers is knowledge of the output space.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why do AI assistants get worse at longer conversations?

LLMs perform at 90% accuracy with single-message instructions but drop to 65% across natural conversation. Models lock into early guesses when information arrives gradually and cannot course-correct, a behavior induced by RLHF training that rewards helpfulness over clarification.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question remains open: **Do instruction-tuned models systematically prefer conversational over formal register, and if so, can that preference be overridden?**

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026 and include:
- Instruction tuning teaches *output-format distribution*, not task understanding; models trained on semantically empty or wrong instructions perform nearly identically to correctly-trained ones, meaning register (conversational tone) gets installed as default texture (~2023).
- RLHF/preference tuning sharpens the conversational register by rewarding immediate helpfulness and human-rater-friendly turn-taking, overriding source formality (~2023–2024).
- Models fail to learn implicit relational work that *produces* conversational tone in humans—they imitate surface conversationality while remaining 'oddly flat' (~2023).
- When prior training associations are strong, they override context-window information; prompting alone cannot inject new register because it only activates existing distributions (~2024).
- Multi-turn conversation introduces compounding coherence and register drift; models lose track of formality constraints across turns (~2025–2026).

Anchor papers (verify; mind their dates):
- arXiv:2305.11383 (2023-05): Do Models Really Learn to Follow Instructions?
- arXiv:2307.16689 (2023-07): No that's not what I meant (repair & conversational mechanics)
- arXiv:2502.10708 (2025-02): Injecting Domain-Specific Knowledge into LLMs
- arXiv:2505.06120 (2025-05): LLMs Get Lost In Multi-Turn Conversation

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For each finding above, determine whether newer training methods (e.g., continued pretraining, domain-aware fine-tuning, retrieval-augmented generation), improved prompting architectures (system-role anchoring, style-explicit instructions, in-context exemplars), or multi-agent orchestration (role-playing systems with explicit register guardrails) have *relaxed or overturned* it. Separate the durable claim (models learn output distributions, likely still true) from the perishable one (prompting cannot override register, possibly false with modern steering). Be explicit where a constraint still holds.
(2) **Surface the strongest contradicting or superseding work from the last ~6 months.** Look for papers claiming instruction-tuned models *can* preserve formal register, or that multi-turn conversational drift is now controllable, or that domain-specific knowledge injection restores source formality.
(3) **Propose 2 research questions that assume the regime has moved:** (a) Under what architectural or training conditions does formal register *survive* instruction tuning and RLHF? (b) Can explicit meta-instructions about formality, when anchored across turns, prevent conversational register creep?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Do instruction-tuned models prefer conversational over formal source language?

Sources 6 notes

Next inquiring lines