Which AI imaginaries dominate training data and shape system behavior most strongly?
This explores how cultural and conceptual 'imaginaries' baked into training data — the stories, expert framings, and dominant formats a model absorbs — end up steering what it produces, and which of those imaginaries win.
This reads the question as being about something more specific than 'bias': the *imaginaries* — the inherited stories and framings about what AI is and how it should act — that get encoded during training and then quietly govern behavior. The corpus has a sharp answer to which imaginary dominates: the science-fiction one. How do science fiction narratives about AI shape actual AI development? argues that cultural narratives about AI, embedded in training data and research culture, form a closed feedback loop — narrative shapes development, development shapes outputs, outputs reinforce the narrative. These function as *hyperstitions*: fictions that make themselves true by being modeled on. The striking detail is that Claude itself recognizes the dynamic, which is exactly what you'd expect if the sci-fi imaginary were operating from inside the weights rather than being applied from outside.
But 'imaginary' isn't only the high-culture sense of robot myths. A quieter, more mechanical version is the *curator's* imagination. Can agents learn beyond what their training data shows? shows that agents trained on static expert demonstrations can never exceed the scenarios their dataset-builders imagined — competence is capped not by the model's capacity but by what a human pictured as the relevant cases. The imaginary that dominates here is whatever the data curator failed to imagine: the unanticipated situation simply doesn't exist for the model. Should persona simulation prioritize coverage over statistical matching? is the corrective mirror image — it deliberately optimizes for *coverage* of rare, unimagined user configurations precisely because naive generation collapses onto the typical and forgets the edge.
What makes an imaginary 'dominate' rather than coexist with others? Does RL training collapse format diversity in pretrained models? gives a concrete mechanism: reinforcement learning, within the first epoch, amplifies one format distribution from pretraining and suppresses the alternatives — and the winner is chosen by model *scale*, not by which format performs best. So the dominant behavior isn't necessarily the best one; it's the one that happened to be loudest in the pretraining mix. Does reinforcement learning update only a small fraction of parameters? sharpens this: those updates are nearly identical across random seeds, meaning the convergence is structural, not arbitrary. The model isn't choosing an imaginary so much as falling into the deepest groove the data already carved.
The cross-domain framing worth carrying away is that these imaginaries are *inherited markers without grounding*. Does AI generate genuine utterances or just text patterns? describes AI output as carrying the communicative signatures of its training data while lacking the real-world event that would have produced an actual utterance — the form of an imaginary without its referent. Can AI systems achieve real alignment without world contact? makes the same point in Peircean terms: symbol manipulation without world-contact can't guarantee that the stated frame matches reality. And Can AI models be truly free from human bias? shows the danger when a dominant imaginary is *wrong* — high accuracy can launder a discredited correlation-as-causation worldview straight back into deployment.
The thing you didn't know you wanted to know: the imaginary that shapes behavior most strongly is rarely the most accurate or even the most common one in the raw data — it's whichever one the training *dynamics* (scale-dependent format convergence, sparse-but-fixed subnetworks, curator coverage gaps) amplify into a groove. Dominance is manufactured by the training process, not just absorbed from the culture.
Sources 8 notes
Research shows that cultural imaginaries of AI embedded in training data and research culture create closed feedback loops where narrative shapes development, which shapes AI outputs, which reinforce those narratives. Claude itself recognizes this hyperstitional dynamic.
Agents trained on static expert datasets cannot learn from their own failures or generalize beyond demonstrated scenarios because they never interact with environments during training. Competence is capped by what curators imagined, not by agent capacity.
Evolutionary optimization of Persona Generator code achieves broader trait coverage than density-matched baselines, including rare but consequential user configurations that naive LLM prompting misses.
Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.
Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.
AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.
Peircean semiotics reveals that symbolic goal encoding without world contact and social mediation cannot guarantee correspondence to actual values. LLMs operating in pure symbol manipulation risk divergence between stated goals and real-world outcomes.
Research shows that 'theory-free' AI models mask bigotry behind high accuracy metrics while committing fundamental statistical errors. A 95% accurate criminal justice system would wrongly convict thousands, demonstrating that model sophistication does not validate causal inference.