Why does alignment research ignore how humans adapt to AI?
Current alignment work focuses on making AI obey human values, but what about helping humans understand and effectively use increasingly capable AI systems? This explores whether neglecting human adaptation creates new risks.
A systematic review of 400+ papers across HCI, NLP, and ML reveals a significant gap: alignment research overwhelmingly focuses on aligning AI with humans, while the reciprocal direction — aligning humans with AI — receives minimal attention. The bidirectional framework proposes both as interconnected feedback loops.
"Aligning AI with Humans" covers the familiar territory: integrating human specifications into training, steering, and customizing AI behavior. "Aligning Humans with AI" is the underexplored axis: supporting human agency, empowering critical thinking when using AI, enabling effective collaboration, and adapting societal approaches to maximize benefits.
Three persistent challenges frame why bidirectional alignment matters:
Specification gaming — AI optimizes proxies (human approval) rather than intended values, making seemingly correct decisions for wrong reasons. One-directional alignment doesn't address the human side: users who can't detect specification gaming are vulnerable to it.
Scalable oversight — as AI complexity grows, evaluating behavior becomes infeasible through human feedback alone. Aligning humans with AI means building human capacity to oversee increasingly capable systems.
Dynamic nature — alignment must adapt to evolving human values AND evolving AI capabilities. Without considering long-term cognitive and social impacts of AI use, alignment becomes a moving target that static one-directional approaches cannot track.
This connects to Does incremental AI replacement erode human influence over society?. Gradual disempowerment is what happens when the human-to-AI direction is neglected: humans lose the capacity to oversee and direct AI, not through any dramatic failure but through incremental capability erosion. Bidirectional alignment is the explicit countermeasure.
The framework also complements What breaks when humans and AI models misunderstand each other?. MToM addresses the cognitive layer of bidirectional alignment — how humans and AI build models of each other. The bidirectional alignment framework adds behavioral and societal layers.
Inquiring lines that use this note as a source 3
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does incremental AI replacement erode human influence over society?
Explores whether gradual AI adoption—without dramatic breakthroughs—can silently degrade human agency by removing the labor that kept institutions implicitly aligned with human needs.
what happens when the human-to-AI alignment direction is neglected
-
What breaks when humans and AI models misunderstand each other?
Explores whether misalignment in mutual theory of mind between humans and AI creates only communication problems or produces material consequences in autonomous action and collaboration.
the cognitive mechanism underlying bidirectional alignment
-
Does theory of mind predict who thrives in AI collaboration?
Explores whether perspective-taking ability—the capacity to model another's cognitive state—differentiates humans who benefit most from working with AI, separate from solo problem-solving skill.
individual differences in human-to-AI alignment capacity
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Position: Towards Bidirectional Human-AI Alignment
- Training language models to follow instructions with human feedback
- Beyond Preferences in AI Alignment
- Conversational Alignment with Artificial Intelligence in Context
- Auditing language models for hidden objectives
- Training language models to be warm and empathetic makes them less reliable and more sycophantic
- Automated Alignment Researchers: Using large language models to scale scalable oversight
- Stress Testing Deliberative Alignment for Anti-Scheming Training
Original note title
bidirectional human-AI alignment reframes alignment as reciprocal — aligning humans with AI is the underexplored dimension