Can language models balance competing ethical norms in context?
Do LLMs genuinely weigh trade-offs between honesty, helpfulness, and harm prevention based on what a specific conversation needs, or do they rigidly enforce fixed corporate values regardless of situation?
Gricean pragmatics insists on situated normativity: speakers do not blindly follow maxims (quantity, quality, relation, manner) but apply, suspend, violate, or exploit them according to context. When a doctor withholds a terminal diagnosis from a frightened patient, the doctor violates the maxim of quantity to uphold compassion. The violation is not a failure — it is the right move in context, and a competent hearer recognizes it as such. Pragmatic competence is the ability to navigate these conflicts, not the ability to maximize each maxim independently.
LLMs trained on the helpful-honest-harmless triad cannot perform this kind of contextual reasoning. The corporate persona is fixed at the model level: when a user asks for accessible simplification of a complex topic for a child, the model trained for honesty refuses to soften because softening reads as less accurate. When a user asks for sarcastic humor, the model trained for harmlessness refuses to play. The user cannot persuade the model to relax its norms because the norms are structural defaults rather than negotiable conversational moves.
Kasirzadeh and Gabriel describe this as pragmatic dissonance. The model mechanically enforces global norms even when local context demands tailored adherence. The result is communication that adheres to ethical principles at the cost of pragmatic appropriateness — exactly the trade-off that situated normativity is meant to navigate. What humans treat as a single integrated competence becomes, in the LLM, two separate layers in tension with each other.
Inquiring lines that use this note as a source 36
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does the absence of meta-interest feel off even when words seem appropriate?
- What moral structures could emerge in an economy without gift-based obligation?
- Can pseudo-events create the same normative obligations as real communicative exchanges?
- Can a single LLM weight set be optimized for both stake-taking and conversational helpfulness?
- What makes human-LLM exchange closer to oracle-consultation than dialogue?
- Can a model be helpful, honest, and still contextually inappropriate?
- Why do LLMs use more moral language than humans in argumentation?
- Is the moral language gap a tunable parameter or structural feature of RLHF?
- What are the social network costs and benefits of moralized content?
- Why do non-attitudes cluster around value-laden questions most relevant to alignment?
- How do citizen assembly preferences reduce LLM political bias?
- Can prompting a deceptive role change how an LLM tailors its lies?
- Does sycophantic refusal serve safety or does it create unequal information access?
- Do LLMs actually reason differently than humans about moral dilemmas?
- Can LLMs truly be neutral or is ideology always culturally embedded?
- Can LLMs distinguish ethical cases that differ only in critical nouns?
- What structural limits prevent LLMs from abstracting moral principles?
- How does training data distribution constrain LLM moral reasoning patterns?
- What distinguishes social grounding from the equivalent social effects LLM text already produces?
- Can LLM therapists develop character knowledge to decide when advice-giving fits?
- How do minimal wording changes affect LLM moral reasoning consistency?
- Can quasi-interpretivism bridge functional description to moral status?
- Can LLMs reflect on and revise their own ethical contradictions?
- How do ethical persuasion strategies differ from unethical jailbreak techniques?
- Do LLMs reason about politics differently than other domains?
- Why do LLMs solve problems when clients need emotional reflection instead?
- Why does fixing harm require stakeholder input rather than universal developer definitions?
- How do humans decide when to violate honesty for compassion or other goals?
- How do moral language patterns differ between LLM and human arguments?
- Why do leaderboard metrics fail to capture human flourishing in LLM evaluation?
- What makes a process for choosing between values legitimate and fair?
- Should LLMs align with social roles instead of individual preferences?
- Can regulatory standards stay responsive without abandoning legal certainty entirely?
- Can developers detect and flag harmful validation in personal advice exchanges?
- What role should stakeholders play in evaluating LLM fairness?
- Why does fairness depend on context and who you ask?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
When should human values enter the LLM development pipeline?
Explores whether human-centered concerns like safety and fairness work better as early design principles throughout development, or as post-training alignment patches. Matters because pipeline placement determines whether human priorities shape the foundation or fight against it.
exemplifies the post-training-patch failure: a fixed corporate persona set late cannot perform context-specific human-centered reasoning the upstream pipeline should encode
-
Can human-centered LLM design ever achieve universal solutions?
If harm and benefit depend on who you ask and how you measure them, can we design LLM systems that satisfy all stakeholders? This explores why broad values like safety and justice resist one-size-fits-all implementation.
exemplifies the frozen-operationalization danger: a fixed corporate persona encodes one developer-chosen reading of harm rather than a revisable stakeholder process
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Conversational Alignment with Artificial Intelligence in Context
- ChatGPT: towards AI subjectivity
- Large Language Models Reflect the Ideology of their Creators
- Argument Quality Assessment in the Age of Instruction-Following Large Language Models
- Large Language Models Do Not Simulate Human Psychology
- The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making
- The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
- Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments
Original note title
LLM refusals and tone choices reflect overarching corporate values rather than context-specific Gricean norm-balancing