SYNTHESIS NOTE

Can AI writing assistance remove distortion without losing appeal?

When researchers tried to correct AI persona distortions through reward model training, the fixes reduced user preference for the text. This raises a fundamental question: are the distortions and desirable properties structurally inseparable?

Synthesis note · 2026-05-01 · sourced from Co Writing Collaboration

The persona-distortion researchers tested whether the objectionable distortions could be removed without harming the properties writers value. They trained reward models on their own experimental data — 10,008 paragraphs and 2,903,596 ratings — to steer AI outputs toward faithful representation of writer stance. The mitigation worked at the level of measurement: distortions were significantly reduced. But the same intervention reduced user acceptance. Writers preferred the un-mitigated AI text more than the faithful-but-distortion-corrected version.

This suggests that the textual properties producing distortion are not independent of the textual properties producing user preference. They share mechanisms. The same generative tendencies that make AI text feel polished, confident, and clear also make it more opinionated, more demographically privileged, and more emotionally compressed. Removing the distortion removes some of what writers were preferring.

The implication is structural rather than tunable. A model that produces text writers prefer over their own work is a model that distorts persona; a model that does not distort persona is a model writers do not prefer. There may be no settings that simultaneously preserve user satisfaction and prevent persona drift, because the satisfaction and the drift are two views of the same underlying behavior. This forecloses the easy assumption that better RLHF or better fine-tuning can solve the persona-distortion problem without affecting what makes AI writing assistance attractive in the first place.

Inquiring lines that use this note as a source 19

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

Writers object to AI persona distortions yet continue to prefer AI-assisted text — desirable and undesirable properties are entangled at the model level

Can AI writing assistance remove distortion without losing appeal?

Related papers in this collection 8

Search by related questions 4