INQUIRING LINE

What cognitive structures do realistic belief models need to include?

This explores what a belief model needs inside it to feel realistic — not just predicting what people say or do, but representing the machinery underneath: the kinds of links, uncertainty, and structure that produce belief in the first place.


This question reads as: if you wanted to model a person's beliefs faithfully — for social simulation, therapy training, or persuasion research — what cognitive ingredients can't you leave out? The corpus converges on a sharp answer: behavior alone isn't enough, and neither is causality alone. A realistic belief model needs internal reasoning structure, multiple kinds of links between ideas, and a way to hold uncertainty.

The first move is to reject behaviorism. Current LLM agents produce plausible outputs without any internal model of why a person believes what they believe, which makes their simulated belief changes untraceable and uncounterfactual — you can't ask "what if she'd seen different evidence?" Can language models simulate belief change in people?. The same gap shows up when LLMs are tested on perspective-taking: they default to surface strategies rather than genuinely tracking what someone else believes, and architectures that force explicit belief tracking beat LLM-alone approaches — suggesting the missing piece is structural, not just more training Do large language models genuinely simulate mental states?.

But what should that structure contain? Causal belief networks are a tempting answer, and they're a good start — yet they capture only one slice of how people reason. Real beliefs also shift through associative links (this reminds me of that), analogical mappings (this is like that other situation), and raw emotion, none of which a pure causal graph represents Can causal models alone capture how humans actually reason?. So a realistic model needs heterogeneous link types, not a single clean logic. It also needs to tolerate ambiguity: people hold competing hypotheses at once, and modeling that requires representing distributions over beliefs rather than one fixed answer — the kind of stochastic, multiple-possibility reasoning that deterministic designs can't express Can stochastic latent reasoning help models explore multiple solutions?.

The strongest practical evidence that explicit cognitive scaffolding pays off comes from therapy simulation: PATIENT-Ψ wires 106 structured cognitive models (built on Beck's cognitive-distortion framework) into an LLM, and expert clinicians rate the result as more authentic than GPT-4 alone — especially for maladaptive belief patterns Can structured cognitive models improve LLM patient simulations for therapy training?. A parallel result in visual-social reasoning shows that staging cognition explicitly — perception, then situation, then norms — beats just generating more text, because the structure itself is what helps Can breaking down visual reasoning into three stages improve model performance?.

Two cautionary notes sharpen the picture. First, surface form can masquerade as reasoning: logically invalid chains-of-thought perform almost as well as valid ones, meaning a model can learn the look of inference without the substance — so a belief model has to encode genuine inferential links, not just plausible-sounding traces Does logical validity actually drive chain-of-thought gains?. Second, beliefs include what people wrongly take for granted: LLMs routinely accept false presuppositions even when they demonstrably know better, which means a faithful model must represent not only what someone believes but the unexamined assumptions they accommodate Why do language models accept false assumptions they know are wrong?. Put together, the corpus sketches a realistic belief model as one with traceable internal reasoning, multiple link types beyond causality, explicit uncertainty, grounded cognitive templates, and a place for the assumptions people never question.


Sources 8 notes

Can language models simulate belief change in people?

LLM agents remain stuck in behaviorism, producing plausible outputs without internal reasoning structures. Modeling belief networks and reasoning traces enables traceability, counterfactual adaptation, and meaningful policy simulation.

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Can causal models alone capture how humans actually reason?

Causal belief networks excel at modeling causal reasoning but cannot represent associative links, analogical mappings, or emotion-driven belief shifts. The GenMinds framework itself acknowledges this as a tractable starting point rather than a complete theory.

Can stochastic latent reasoning help models explore multiple solutions?

GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent distributions over solutions rather than single predictions. This allows handling of ambiguous problems and multiple valid strategies that deterministic designs cannot represent.

Can structured cognitive models improve LLM patient simulations for therapy training?

PATIENT-Ψ integrates 106 Beck CCD-based cognitive models with LLMs to simulate patients with specific maladaptive patterns. Expert evaluators rated the fidelity higher than GPT-4, particularly for maladaptive cognitions and conversational authenticity.

Can breaking down visual reasoning into three stages improve model performance?

CoCoT structures VLM reasoning through embodied perception, embedded situation analysis, and norm-grounded interpretation, achieving +8% improvement over flat CoT on social benchmarks. The gains suggest cognitive structure matters more than reasoning volume for social tasks.

Does logical validity actually drive chain-of-thought gains?

Illogical chain-of-thought exemplars matched valid CoT performance on BIG-Bench Hard, showing that structural properties—not logical validity—drive the gains. The model learns the form of reasoning, not genuine inference.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Next inquiring lines