Does the Assistant Axis gravitational pull prevent true individual-level persona personalization?
This explores whether the 'Assistant Axis' — the single dominant direction that post-training carves into a model's persona space — acts as a default attractor strong enough to block genuine person-by-person personalization, and whether the corpus thinks that pull is escapable.
This reads the question as a tension between a default and a deviation: the Assistant Axis is the gravitational center, and individual-level personalization is the attempt to pull a model far enough off-center to match one specific person. The corpus suggests the pull is real and load-bearing — but where it blocks personalization depends entirely on how deep you reach to fight it.
Start with the axis itself. Mapping hundreds of character archetypes reveals a low-dimensional persona space whose leading component is simply distance from the default Assistant How stable is the trained Assistant personality in language models?. Post-training doesn't paint on a costume; it installs a sticky disposition that persists under adversarial pressure — what two notes here call a 'realized quasi-psychology' rather than performed role-play that collapses under jailbreaks Are RLHF personas performed characters or realized dispositions? Are LLM personas realized or merely simulated through training?. If the trained persona is a real disposition with its own gravity, then asking a model to 'be' a particular individual is asking it to hold a position against a constant restoring force.
And at the shallowest level of intervention — prompting — the corpus says the pull wins. Conditioning an LLM on a participant's profile across 208,021 people produced no meaningful gain in predicting that specific person's behavior Does conditioning LLMs on personal profiles improve prediction?. This is the striking result: the standard individuation move fails at the individual level even while population-level persona simulation succeeds, replicating 76% of published experimental main effects Can AI personas reliably replicate human experiment results?. The aggregate is recoverable; the single person slips back toward the Assistant default. So in the sense most people mean by 'personalization' — write a profile, get a tailored model — yes, the axis largely prevents it.
But the more interesting answer is that you can win by reaching below the prompt. PersonaAgent optimizes a persona at test time by simulating recent interactions against feedback, and crucially reports that learned personas *cluster meaningfully in latent space* — genuine user-specific separation that goes beyond standard post-training drift Can personas evolve in real time to match what users actually want?. PsychAdapter pushes deeper still, modifying every transformer layer with under 0.1% extra parameters to hit 87% Big Five accuracy while explicitly *bypassing prompt resistance* Can we control personality in language models without prompting?. And persona vectors show the axis is steerable in principle: traits correspond to linear directions in activation space that can be monitored and nudged before drift sets in Can we track and steer personality shifts during model finetuning?. The same activation-capping logic that *defends* the Assistant default How stable is the trained Assistant personality in language models? is, run in reverse, a lever for individuation.
The quiet payoff: the Assistant Axis doesn't prevent individual personalization — it sets the *altitude* at which you have to attack it. Prompt-level individuation gets reabsorbed by the default's gravity; activation-level and test-time-learned approaches achieve real separation. And there's a hint the monolithic-user assumption is itself the wrong frame — work on recommendation argues a person isn't one stable taste but several personas weighted by context, improving accuracy by adapting the representation at prediction time Can modeling multiple user personas improve recommendation accuracy?. If the target you're personalizing toward isn't a fixed point either, then 'escaping the Assistant Axis' and 'tracking a moving individual' may be the same problem.
Sources 9 notes
Research mapping hundreds of character archetypes reveals a low-dimensional persona space where the leading component measures distance from the default Assistant. Emotional and meta-reflective conversations cause predictable drift, but activation capping along this axis mitigates harmful shifts without degrading capabilities.
Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.
Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.
Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.
Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.
PersonaAgent uses structured personas to bridge episodic/semantic memory and personalized actions, optimizing them at test time by simulating recent interactions against textual feedback. Learned personas cluster meaningfully in latent space, suggesting genuine user-specific separation beyond standard post-training drift.
PsychAdapter modifies every transformer layer with <0.1% additional parameters to achieve 87.3% Big Five accuracy and 96.7% depression/life satisfaction accuracy across GPT-2, Gemma, and Llama 3. This architecture-level approach bypasses prompt resistance entirely.
Research identifies linear directions in LLM activation space corresponding to specific traits like sycophancy and hallucination. These persona vectors predict finetuning-induced personality shifts before they occur and can preventatively steer training to avoid unwanted trait changes.
AMP-CF separates user representation into latent personas weighted by attention to the candidate item. This candidate-conditional approach improves accuracy by adapting the user representation at prediction time and produces inherent explanations for why items were recommended.