Can language models learn to form ad-hoc conventions through training?

This explores whether models can develop the negotiated, on-the-fly shared meanings that communicators normally coordinate between themselves — and the corpus suggests it depends sharply on whether you mean conventions of form or conventions of meaning.

This explores whether a language model can develop ad-hoc conventions — the improvised shared agreements (a coined term, a private shorthand, a register two parties settle into) that emerge in real interaction. The corpus splits the question cleanly along a fault line: models pick up conventions of *form* readily through training, but the negotiated, meaning-bearing kind runs into structural walls.

The pessimistic thread is the stronger one. Convention-forming is fundamentally a pragmatic act — you and your interlocutor coordinate on a meaning neither of you held alone. Bender & Koller's argument Can language models learn meaning from text patterns alone? is that meaning lives in the relation between expressions and communicative intent, and a model trained only on form-to-form prediction has no access to the shared attention that grounds an agreement. Worse, even when a convention is established *in the conversation*, the model tends to revert: strong training-time priors override what's present in the context window Why do language models ignore information in their context?, so a freshly negotiated usage gets steamrolled by the statistically dominant one. And the very thing that would let a model adapt its register to a partner — pragmatic register-switching — is largely trained *out*, since alignment locks the model into one static communicative identity that users can't reshape through dialogue Can language models adapt communication style to different contexts?.

There's a subtler problem underneath: convention-forming assumes a stable party to do the agreeing. Shanahan's 20-questions test Do large language models actually commit to a single character? shows the model isn't committing to one character but sampling from a superposition — so "the model" you struck a convention with may not be the same one that answers next. A convention needs a counterparty who *holds* the agreement; a sampler doesn't.

But flip to conventions of pure form and the picture brightens. DPO training on correct-and-incorrect examples reliably drills in rigid output conventions — exact function-calling formats — where ordinary fine-tuning underperforms Can small models match large models on function calling?, which is essentially learning an arbitrary convention by being shown what violates it. And Transformer² composes task-specific expert behaviors on the fly at inference Can models dynamically activate expert skills at inference time?, a hint that adaptive, situation-specific reconfiguration is mechanically possible even if it isn't the same as negotiating meaning with a partner.

The thing worth carrying away: "learn a convention" quietly bundles two very different feats. Internalizing an arbitrary regularity from training data is something models do well — that's what training *is*. Coordinating a *new* shared meaning live, with an intent-bearing partner, is the part the corpus keeps flagging as out of reach — not because the model can't be flexible, but because it reproduces familiar training-distribution patterns rather than inventing coordinations, the same imitation signature that makes chain-of-thought degrade under distribution shift Does chain-of-thought reasoning reveal genuine inference or pattern matching?. Ad-hoc convention is exactly the distribution shift it's worst at.

Sources 7 notes

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Can small models match large models on function calling?

Small models fine-tuned via DPO on correct and incorrect function-calling examples from a large teacher model achieve high accuracy on logical and mathematical tasks. DPO's explicit negative examples directly target the rigid output format failures where SFT alone underperforms.

Can models dynamically activate expert skills at inference time?

Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Can language models learn to form ad-hoc conventions through training?

Sources 7 notes

Next inquiring lines