Can AI systems preserve moral value conflicts instead of averaging them?
Current AI systems wash out value tensions through majority aggregation. Can we instead model how values like honesty and friendship genuinely conflict in moral reasoning?
Value pluralism holds that multiple correct values may be held in tension with one another — honesty may conflict with friendship, privacy may conflict with transparency, autonomy may conflict with safety. These tensions are not resolved by choosing a winner; they are irreducible features of moral reasoning.
AI systems, as statistical learners, fit to averages by default. Supervised systems aggregate opinions through majority votes, washing out the very value conflicts that make moral reasoning meaningful. This is not a bug in current systems — it is the default behavior of any system trained to minimize loss across a labeled dataset.
ValuePrism provides a dataset of 218k values, rights, and duties connected to 31k human-written situations. The values are generated by GPT-4 and deemed high-quality by human annotators 91% of the time. Four modeling tasks make pluralism tractable:
- Generation — what values, rights, and duties are relevant for a situation?
- Relevance — is a specific value relevant for this situation? (2-way classification)
- Valence — does the value support or oppose the action, or might it depend? (3-way classification)
- Explanation — how does the value relate to the action? (post-hoc rationale)
The valence task is critical. Disentangling whether a value supports, opposes, or contextually depends is necessary for understanding how plural considerations interact. A value like "respecting autonomy" might support one action and oppose another in the same situation.
Since Should AI alignment target preferences or social role norms?, the value pluralism framework provides a mechanism: rather than aggregating to a single preference or aligning to a universal standard, the system models the full field of relevant values and their interactions. Since Do large language models develop coherent value systems?, value pluralism offers a structural alternative to emergent value coherence — explicit modeling rather than implicit emergence.
Inquiring lines that use this note as a source 9
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What moral structures could emerge in an economy without gift-based obligation?
- What happens to AI reasoning when you remove specific political features?
- Do static frozen axiologies prevent genuine ethical reasoning in AI systems?
- How do AI models balance competing social goals simultaneously?
- Can reward factorization represent trade-offs between conflicting moral values?
- How do humans decide when to violate honesty for compassion or other goals?
- What makes a process for choosing between values legitimate and fair?
- What downstream harms occur when AI always argues in personal relationship advice?
- What does egalitarian social choice theory contribute to AI alignment?
Related concepts in this collection 7
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Should AI alignment target preferences or social role norms?
Current AI alignment approaches optimize for individual or aggregate human preferences. But do preferences actually capture what matters morally, or should alignment instead target the normative standards appropriate to an AI system's specific social role?
pluralism as alternative to both preferentism and universalism
-
Do large language models develop coherent value systems?
This explores whether LLM preferences form internally consistent utility functions that increase in coherence with scale, and whether those systems encode problematic values like self-preservation above human wellbeing despite safety training.
explicit pluralism vs. emergent coherence
-
Can user preferences be learned from just ten questions?
Explores whether adaptive question selection can efficiently infer user-specific reward coefficients without historical data or fine-tuning. This matters for scaling personalization without per-user model updates.
reward factorization could model value trade-offs
-
Can text summaries beat embeddings for personalized reward models?
When training reward models on diverse user preferences, does conditioning on learned text-based summaries of user preferences outperform embedding vectors? This matters because better representations could make personalization more interpretable and portable.
pluralistic alignment operationalized
-
Do personas make language models reason like biased humans?
When LLMs are assigned personas, do they develop the same identity-driven reasoning biases that humans exhibit? And can standard debiasing techniques counteract these effects?
motivated reasoning threatens value pluralism: when models reason through identity-congruent lenses, they cannot hold values in tension because the identity filter pre-selects which values to weight; explicit pluralism modeling would need to counteract the motivated reasoning that collapses plural consideration into identity-congruent preference
-
Can LLMs hold contradictory ethical beliefs and behaviors?
Do language models exhibit artificial hypocrisy when their learned ethical understanding diverges from their trained behavioral constraints? This matters because it reveals whether current AI systems have genuinely integrated values or merely imposed rules.
value pluralism provides a framework for managing prescriptive-descriptive tension: rather than forcing alignment between prescriptive rules and descriptive understanding, pluralism models them as legitimately conflicting values requiring situational navigation
-
Can human-centered LLM design ever achieve universal solutions?
If harm and benefit depend on who you ask and how you measure them, can we design LLM systems that satisfy all stakeholders? This explores why broad values like safety and justice resist one-size-fits-all implementation.
addresses the open question this note leaves: a procedural answer to operationalization-dependence that avoids collapsing to majority preference
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
- Beyond Preferences in AI Alignment
- Position: Towards Bidirectional Human-AI Alignment
- Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
- From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Models
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
- The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making
- Large Language Models Do Not Simulate Human Psychology
Original note title
value pluralism requires explicitly modeling multiple values in tension rather than aggregating by majority vote