Inquiring Lines
5,588 lines of inquiry, grouped into 8 areas and 28 emergent themes — surfaced by clustering the questions themselves (distinct from the source topics). Browse the map below, or open the faceted explorer to combine themes, moves, and the source research a line draws on.
Reasoning Failures & Faithfulness 997
Why reasoning models break down and whether their chain-of-thought traces genuinely reflect and drive internal computation rather than misleading users.
- Do shorter reasoning traces actually produce more reliable model outputs?
- Why do reasoning models fail on structurally unfamiliar instances?
- Does more inference compute help reasoning models match specialized domain performance?
- What internal mechanisms explain LLM reasoning and representation limits?
- How does structural complexity affect LLM performance differently than inferential complexity?
- Do LLMs struggle more with semantic accuracy than syntactic correctness across domains?
- Does internal self-revision actually degrade reasoning accuracy in models?
- Can single models correct their own beliefs without amplifying confidence in wrong answers?
- What role does inductive bias play versus model capacity in practice?
Agents & Retrieval Systems 787
How agent execution infrastructure, retrieval architectures, recommenders, and multi-agent coordination determine the reliability of LLM-powered systems.
- Can parallel retrieval chains avoid the context consumption problem?
- How do taxonomy-based retrieval scaffolds improve model performance at inference time?
- How does hierarchical query planning versus flat prompting affect multi-source retrieval?
- When does memory consolidation help agents instead of hurting performance?
- How do standardized artifacts prevent autonomous agent failure modes?
- How does durable memory quality shape agent performance over time?
- How should recommendation systems balance individual preference signals with population-level patterns?
- Can recommender systems separate true preference from individual rating style bias?
- Should recommendation evaluation enforce probability competition between candidate items?
Language & Social Cognition 763
How LLMs fall short in grounding meaning, collaborative discourse, and persuasion, defaulting to surface strategies rather than genuine social reasoning.
- Why do language models fail at grounding and inference?
- Can language models correct false assumptions or only reinforce them?
- Why do language models presume common ground rather than build it?
Training, Scaling & Reward 678
How pretraining data, model scale, inductive biases, and reward design shape generalization, alignment, and the avoidance of spurious shortcuts.
- Can identical model performance mask fundamentally broken internal representations?
- Why does scaling data and model size improve compositional generalization?
- Why do smaller and larger models converge on different output formats?
Conversational & Emotional AI 642
How dialogue systems, therapeutic chatbots, and persona simulations sustain grounding, consistency, and genuine emotional attunement over interaction.
- Does optimizing for alignment actually reduce conversational grounding over time?
- Does preference optimization degrade other conversational properties besides grounding?
- Can multi-turn conversations manipulate language model reasoning in similar ways to personas?
- Why does trait-level warmth amplify sycophancy in therapeutic AI contexts?
- How does empathetic engagement destabilize model reliability and persona stability?
- Why do RLHF-trained models struggle with proactive emotional attunement in conversations?
Reasoning Quality & Compute 634
How training format, prompting, context budgeting, and inference-time compute allocation shape the quality, strategy, and efficiency of model reasoning.
- Do reasoning models switch approaches when encountering local difficulty?
- How does training format shape reasoning strategy more than content?
- Why does fine-tuning degrade reasoning quality even as accuracy improves?
Model Internals & Capability Limits 595
How attention and memory mechanisms, RL training dynamics, and capability cliffs govern model behavior, diversity, and the boundaries of reliable performance.
- Why does the gap between theoretical expressiveness and learned capability matter?
- Can capability boundary collapse be addressed by operating at representational rather than token level?
- Can capability boundary collapse be reversed through external data?
Trust, Authority & Reliance 492
How AI fluency and presentation earn unearned credibility, distort perceptions of authorship, and reshape human skill and independent judgment.
- Why does polished AI output exploit reader trust in expert judgment?
- Why is confidence a dangerous proxy for accuracy in human-AI interaction?
- How does opaque AI processing distort users' perception of their contribution?