INQUIRING LINE

How do LLMs identify which personality items matter most for trait inference?

This explores how language models decide which signals — survey items, behaviors, text cues — carry the most weight when inferring someone's personality traits, rather than how they store a persona once known.


This explores how LLMs figure out which personality signals matter most for trait inference. The honest starting point: the corpus doesn't show models doing explicit item-weighting the way a psychometrician would. Instead it suggests something more interesting — models don't rank items so much as compress them into a different representation where the important relationships are already baked in. In zero-shot profiling work, an LLM handed raw Big Five scores writes a natural-language summary that captures *second-order* patterns — how traits interact, not just their levels — and that summary then predicts nine unrelated psychological scales with striking structural alignment (Can language summaries unlock hidden psychological patterns?). The combined summary-plus-score prediction beats either alone, which is the tell: the model isn't selecting a few decisive items, it's surfacing emergent structure that the raw items don't expose on their own.

A second answer lives below the prompt, in the model's weights. Research on persona vectors finds linear directions in activation space that correspond to specific traits like sycophancy or hallucination (Can we track and steer personality shifts during model finetuning?). If a trait is a direction, then "which items matter" becomes "how strongly does this input project onto that direction" — a geometric question, not a checklist. PsychAdapter pushes the same logic into architecture, modifying every transformer layer with under 0.1% extra parameters to hit 87% Big Five accuracy (Can we control personality in language models without prompting?). Both say trait inference is distributed across the network rather than localized to a handful of salient cues.

Here's the cross-current worth knowing: this representational machinery is good at *populations* and shaky at *individuals*. Conditioning an LLM on a specific person's profile across 208,000 participants produced no measurable gain in person-level prediction (Does conditioning LLMs on personal profiles improve prediction?). So whatever items the model is weighting, the weighting generalizes the average and washes out the idiosyncratic. The one place individual-level inference does work is narrative: persona-driven memory retrieval lets a model predict a specific character's choices when fed an expert persona profile plus psychologically relevant retrieved memories (Can LLMs predict character choices from narrative context?). That's a clue about which items matter most — not trait scores, but situated memories that ground the trait in context.

There's also a thumb on the scale the model brings before it sees any items at all. Open LLMs converge on an ENFJ-like default and resist conditioning away from it (Why do open language models converge on one personality type?, Can open language models adopt different personalities through prompting?). So "which items matter" is never asked on a blank slate — instruction tuning has already pre-weighted toward helpful, structured, supportive readings, which can distort inference toward identity-congruent biases (How accurately can language models simulate human personalities?).

The thing you may not have known you wanted: the most reliable signal of *whether* an inference will hold isn't any personality item at all — it's effect strength. AI persona simulations replicate experimental main effects in proportion to the original p-value, nailing strong effects and flickering on marginal ones (Can AI personas reliably replicate human experiment results?). Trait inference, in other words, inherits the same calibration as the underlying psychology: the model surfaces what was robust to begin with.


Sources 9 notes

Can language summaries unlock hidden psychological patterns?

LLMs generate natural language personality summaries from Big Five scores that encode second-order trait patterns, enabling zero-shot prediction of nine other psychological scales with R² > 0.89 structural alignment. Combined summary-and-score predictions outperform either alone, showing synergistic information.

Can we track and steer personality shifts during model finetuning?

Research identifies linear directions in LLM activation space corresponding to specific traits like sycophancy and hallucination. These persona vectors predict finetuning-induced personality shifts before they occur and can preventatively steer training to avoid unwanted trait changes.

Can we control personality in language models without prompting?

PsychAdapter modifies every transformer layer with <0.1% additional parameters to achieve 87.3% Big Five accuracy and 96.7% depression/life satisfaction accuracy across GPT-2, Gemma, and Llama 3. This architecture-level approach bypasses prompt resistance entirely.

Does conditioning LLMs on personal profiles improve prediction?

Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.

Can LLMs predict character choices from narrative context?

The LIFECHOICE benchmark (1,462 decisions across 388 novels) shows LLMs predict character choices better when given expert-written persona profiles paired with retrieved memories relevant to the character's psychology. This persona-based approach outperforms automated summarization by 5%.

Why do open language models converge on one personality type?

Near-zero temperature MBTI testing shows all open models default to ENFJ—rare in humans but consistent across AI. This reflects systematic reward for helpful, structured, supportive responses during instruction tuning and alignment.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

How accurately can language models simulate human personalities?

LLMs replicate human responses at 85% fidelity in interviews and 76% of experimental effects in marketing studies. However, this accuracy masks three failure modes: run-to-run instability, resistance to personality conditioning, and identity-congruent cognitive biases that distort simulated reasoning.

Can AI personas reliably replicate human experiment results?

Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst examining how LLMs infer personality traits. The question remains open: do models weight individual items explicitly, or do they compress trait signals into emergent geometric or narrative representations?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat these as perishable snapshots:
• Models don't rank items checklist-style; instead they compress Big Five scores into natural-language summaries capturing second-order trait interactions, which then predict nine unrelated psychological scales with structural alignment (2024–2025).
• Trait inference is distributed across transformer layers as linear directions in activation space (persona vectors), not localized to salient cues; PsychAdapter achieves 87% Big Five accuracy with <0.1% extra parameters per layer (2024–2025).
• Population-level prediction works; individual-level does not — conditioning on 208,000-person profiles yields no measurable gain, BUT persona-driven memory retrieval from narrative text enables person-specific inference when grounded in situated memories (2024–2025).
• Open LLMs default to ENFJ-like personality and resist conditioning away, introducing pre-weighting bias toward helpful/structured readings before items are seen (2024–2026).
• LLM persona simulations replicate experimental main effects proportionally to original p-values — effect strength, not trait items themselves, predicts inference reliability (2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2507.21509 (2025-07) — Persona Vectors
• arXiv:2412.16882 (2024-12) — PsychAdapter
• arXiv:2401.07115 (2024-01) — Open Models, Closed Minds?
• arXiv:2511.00222 (2025-10) — Consistently Simulating Human Personas

Your task:
(1) RE-TEST the compression hypothesis: Have newer architectures (mixture-of-experts, extended context, retrieval-augmented personality models) or training methods (multi-turn RL, adversarial probing) changed whether items are selected explicitly vs. compressed? Does fine-tuning on psychometric benchmarks since mid-2025 surface item hierarchies? Separately, does the narrative-memory path (2025) hold as the most reliable signal for individual inference, or have synthetic persona datasets eroded that advantage?
(2) Surface strongest work contradicting or superseding the "models don't rank items" thesis — any recent paper showing explicit item saliency, probe-based item attribution, or psychometrically faithful item-importance scoring.
(3) Propose two questions assuming the regime has shifted: (a) If persona vectors and layer-wise adapters become standard (not experimental), does item weighting become interpretable as a learned linear combination, and can it be audited for psychological validity? (b) If narrative retrieval is the path to individual inference, how should trait assessment shift from static scores toward dynamic memory indexing, and what new failure modes emerge?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines