All Papers
White paper excerpts.
A204↑ top
- A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy2025
- A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap2025
- A Comparative Study on Reasoning Patterns of OpenAI's o1 Model2024
- A comprehensive analysis of concept drift locality in data streams2023
- A Comprehensive Evaluation of Inductive Reasoning Capabilities and Problem Solving in Large Language Models
- A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges2025
- A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications2025
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models2024
- A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems2025
- A comprehensive taxonomy of hallucinations in Large Language Models2025
- A Computational Framework for Behavioral Assessment of LLM Therapists2024
- A Contextual-Bandit Approach to Personalized News Article Recommendation2010
- A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems2023
- A Decomposition Perspective to Long-context Reasoning for LLMs2026
- A Domain Specific Modeling Language for Multiagent Systems
- A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models2025
- A Framework for Collaborating a Large Language Model Tool in Brainstorming for Triggering Creative Thoughts2024
- A framework for the use of generative modelling in non-equilibrium statistical mechanics2024
- A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts2024
- A Hybrid Human-AI Approach for Argument Map Creation From Transcripts
- A Hybrid Intelligence Method for Argument Mining2024
- A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning2024
- A Little Human Data Goes A Long Way2024
- A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions2024
- A Mechanistic Analysis of Looped Reasoning Language Models2026
- A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis2023
- A meta-analysis of the persuasive power of large language models
- A Multi-facet Paradigm to Bridge Large Language Model and Recommendation2023
- A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity2023
- A natural language processing approach reveals first-person pronoun usage and non-fluency as markers of therapeutic alliance in psychotherapy
- A Non-Factoid Question-Answering Taxonomy
- A Personalized Recommender System based-on Knowledge Graph Embeddings2023
- A polar coordinate system represents syntax in large language models2024
- A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows2025
- A Primer on the Inner Workings of Transformer-based Language Models2024
- A Probabilistic Model for Using Social Networks in Personalized Item Recommendation
- A recipe for annotating grounded clarifications2021
- A ripple in time: a discontinuity in American history2023
- A Robustness Evaluation Framework for Argument Mining
- A Socially-Aware Conversational Recommender System for Personalized Recipe Recommendations
- A sociotechnical perspective for the future of AI: narratives, inequalities, and human control
- A Survey of Calibration Process for Black-Box LLMs2024
- A Survey of Context Engineering for Large Language Models2025
- A Survey of Continual Reinforcement Learning2025
- A Survey of Meta-Reinforcement Learning2023
- A Survey of Reinforcement Learning from Human Feedback2023
- A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence2025
- A Survey on Concept Drift Adaptation
- A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions2024
- A Survey on Diffusion Language Models2025
- A Survey on Knowledge Distillation of Large Language Models2024
- A Survey on Large Language Models for Recommendation2023
- A Survey on Large Language Models with some Insights on their Capabilities and Limitations2025
- A Survey on Lexical Ambiguity Detection and Word Sense Disambiguation2024
- A Survey on LLM Inference-Time Self-Improvement2024
- A Survey on Post-training of Large Language Models2025
- A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects2023
- A Survey on Prompt Tuning2025
- A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems2024
- A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?2025
- A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks2025
- A Taxonomy of Empathetic Questions in Social Dialogs
- A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o12025
- A Unified Multi-task Learning Framework for Multi-goal Conversational Recommender Systems2022
- Abductive Reasoning with the GPT-4 Language Model: Case studies from criminal investigation, medical practice, scientific research2023
- Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
- Absolute Zero: Reinforced Self-play Reasoning with Zero Data2025
- AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions2025
- ACE: Abstractions for Communicating Efficiently2024
- Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems2021
- Activation Steering for Chain-of-Thought Compression2025
- Active Listening: Personalized Question Generation in Open-Domain Social Conversation with User Model Based Prompting
- Active Retrieval Augmented Generation2023
- Adam's Law: Textual Frequency Law on Large Language Models2026
- Adaptation of Agentic AI
- Adapter-based Selective Knowledge Distillation for Federated Multi-domain Meeting Summarization2023
- Adapting LLM Agents with Universal Feedback in Communication2023
- Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics2025
- Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home2025
- Adding Chit-Chat to Enhance Task-Oriented Dialogues2020
- Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach2024
- Advances and Challenges in Conversational Recommender Systems: A Survey2021
- Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems2025
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling2025
- Advancing LLM Reasoning Generalists with Preference Trees2024
- Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs2025
- Affordable AI Assistants with Knowledge Graph of Thoughts2025
- Agent A/B: Automated and Scalable A/B Testing on Live Websites with Interactive LLM Agents2025
- Agent Development Kit
- Agent Laboratory: Using LLM Agents as Research Assistants2025
- Agent Learning via Early Experience2025
- Agent S: An Open Agentic Framework that Uses Computers Like a Human2024
- Agent Workflow Memory2024
- Agent-as-a-Judge: Evaluate Agents with Agents2024
- Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models2025
- Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training2025
- AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs2025
- AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation2024
- Agentic AI and the next intelligence explosion2026
- Agentic Code Reasoning2026
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models2025
- Agentic Misalignment: How LLMs Could Be Insider Threats2025
- Agentic Reasoning for Large Language Models2026
- Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research2025
- Agentic Systems as Boosting Weak Reasoning Models2026
- Agentic Web: Weaving the Next Web with AI Agents2025
- AgentRxiv: Towards Collaborative Autonomous Research2025
- Agents Are Not Enough2024
- Agents of Chaos2026
- AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs2025
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
- Agreement Tracking for Multi-Issue Negotiation Dialogues2023
- AI & Human Co-Improvement for Safer Co-Superintelligence2025
- AI Agent Traps
- AI Agents Need Memory Control Over More Context2026
- AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges2025
- AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
- AI Assistance Reduces Persistence and Hurts Independent Performance2026
- AI Can Learn Scientific Taste2026
- AI Companions Reduce Loneliness2024
- AI Compute Architecture and Evolution Trends2025
- AI Enters Public Discourse: A Habermasian Assessment Of The Moral Status Of Large Language Models
- AI for Auto-Research: Roadmap & User Guide2026
- AI Meets the Classroom: When Does ChatGPT Harm Learning?2024
- AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms2025
- AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting
- AI-Powered (Finance) Scholarship
- AI-Researcher: Autonomous Scientific Innovation2025
- AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data2025
- aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists2025
- Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models2023
- Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models2024
- ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making2025
- Aligning Language Models to Explicitly Handle Ambiguity2024
- Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning2025
- All AI Models are Wrong, but Some are Optimal2025
- AlphaEvolve: A coding agent for scientific and algorithmic discovery2025
- AlphaGo Moment for Model Architecture Discovery2025
- Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models2019
- An Automatic Graph Construction Framework based on Large Language Models for Recommendation2024
- An Empirical Study of GPT-4o Image Generation Capabilities2025
- An Emulator for Fine-Tuning Large Language Models using Small Language Models2023
- An extended framework for characterizing social robots2019
- An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems2025
- An Overview Of Temporal Commonsense Reasoning and Acquisition2023
- ANAPHORA RESOLUTION: THE STATE OF THE ART
- Answer is All You Need: Instruction-following Text Embedding via Answering the Question2024
- Answering Questions by Meta-Reasoning over Multiple Chains of Thought2023
- Apollo's Oracle: Retrieval-Augmented Reasoning in Multi-Agent Debates2023
- Approaching Human-Level Forecasting with Language Models2024
- Are Customers Lying to Your Chatbot?
- Are Emergent Abilities in Large Language Models just In-Context Learning?2023
- Are Emergent Abilities of Large Language Models a Mirage?2023
- Are LLMs All You Need for Task-Oriented Dialogue?2023
- Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks2023
- AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning2025
- ARGS: Alignment as Reward-Guided Search2024
- Argument Quality Assessment in the Age of Instruction-Following Large Language Models
- Argument Summarization and its Evaluation in the Era of Large Language Models2025
- Argumentative Large Language Models for Explainable and Contestable Decision-Making2024
- Argunauts: Open LLMs that Master Argument Analysis with Argdown2021
- Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics2024
- Artifacts as Memory Beyond the Agent Boundary2026
- Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)2025
- Artificial Intelligence and the Labor Market∗
- Artificial intelligence is ineffective and potentially harmful for fact checking2023
- ASI-Evolve: AI Accelerates AI2026
- Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models2023
- Ask, and it shall be given: Turing completeness of prompting2024
- Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework2022
- Asking Clarifying Questions Based on Negative Feedback in Conversational Search2021
- Aspect-oriented Opinion Alignment Network for Aspect-Based Sentiment Classification2023
- Assessing adaptive world models in machines with novel games2025
- Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models2025
- Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews2023
- Assessment of Personality Dimensions Across Situations Using Conversational Speech2025
- Atesa-bært: A Heterogeneous Ensemble Learning Model For Aspect-based Sentiment Analysis2023
- Atom of Thoughts for Markov LLM Test-Time Scaling2025
- Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward2025
- Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models2025
- Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data2025
- Attention on the brain
- Attention, Intentions, And The Structure Of Discourse
- Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models2025
- Attribute Controlled Dialogue Prompting2023
- Auditing language models for hidden objectives2025
- Augmenting Autotelic Agents with Large Language Models2023
- Augmenting Netflix Search with In-Session Adapted Recommendations2022
- AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling2025
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework2023
- Autogenesis: A Self-Evolving Agent Protocol2026
- AutoGLM: Autonomous Foundation Agents for GUIs2024
- Automated Alignment Researchers: Using large language models to scale scalable oversight2022
- Automated Design of Agentic Systems2024
- Automated Social Science: Language Models as Scientist and Subjects2024
- Automatic Extraction of Metaphoric Analogies from Literary Texts: Task Formulation, Dataset Construction, and Evaluation2024
- Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data2023
- Automatic Prompt Optimization with "Gradient Descent" and Beam Search2023
- Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies2023
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts2020
- AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration2026
- AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation2026
- Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey2020
- AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders2025
B55↑ top
- Backtracing: Retrieving the Cause of the Query2024
- Base Models Know How to Reason, Thinking Models Learn When2025
- Behavioral Exploration: Learning to Explore via In-Context Adaptation2025
- Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling2024
- Benchmarking the Pedagogical Knowledge of Large Language Models2025
- Better Alignment with Instruction Back-and-Forth Translation2024
- Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases2025
- Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs2025
- Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback2025
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey2024
- Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models2025
- Beyond Answers: How LLMs Can Pursue Strategic Thinking in Education2025
- Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty2025
- Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration2025
- Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning2025
- Beyond Discrete Personas: Personality Modeling Through Journal Intensive Conversations2024
- Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing2025
- Beyond Hallucinations: The Illusion of Understanding in Large Language Models2025
- Beyond Language Modeling: An Exploration of Multimodal Pretraining2026
- Beyond neural scaling laws: beating power law scaling via data pruning2022
- Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration2025
- Beyond Preferences in AI Alignment2024
- Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts2025
- Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment2025
- Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning2025
- Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens2025
- Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate2025
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL2025
- Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning2025
- Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR2025
- Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think2025
- Beyond the Surface: Probing the Ideological Depth of Large Language Models2025
- Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following2025
- Beyond Turing: Memory-Amortized Inference as a Foundation for Cognitive Computation2025
- Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
- Bilevel Autoresearch: Meta-Autoresearching Itself2026
- Boosted Prompt Ensembles for Large Language Models2023
- Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought2023
- Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need2025
- Boundless Socratic Learning with Language Games2024
- Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond2025
- Branch-Solve-Merge Improves Large Language Model Evaluation and Generation2023
- Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM2024
- Break It Down: Evidence for Structural Compositionality in Neural Networks2023
- Break the Chain: Large Language Models Can be Shortcut Reasoners2024
- Bridging Offline and Online Reinforcement Learning for LLMs2025
- Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.2023
- BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent2025
- Building a Stronger CASA: Extending the Computers Are Social Actors Paradigm
- Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions2021
- Building Cooperative Embodied Agents Modularly with Large Language Models2023
- Building Decision Making Models Through Language Model Regime2024
- Building Machines that Learn and Think with People2024
- Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning2023
- Byte Latent Transformer: Patches Scale Better Than Tokens
C154↑ top
- Calibrated Recommendations
- CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society2023
- Can AI Agents Agree?2026
- Can AI Explanations Make You Change Your Mind?2025
- Can AI Have a Personality? Prompt Engineering for AI Personality Simulation: A Chatbot Case Study in Gender-Affirming Voice Therapy Training2025
- Can Authorship Representation Learning Capture Stylistic Features?2023
- Can Language Models Recognize Convincing Arguments?2024
- Can Language Models Represent the Past without Anachronism?2025
- Can Language Models Serve as Text-Based World Simulators?2024
- Can Language Models Solve Graph Problems in Natural Language?2023
- Can Large Language Models Capture Human Annotator Disagreements?2025
- Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess2025
- Can Large Language Models do Analytical Reasoning?2024
- Can large language models explore in-context?2024
- Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education2024
- Can Large Language Models perform Relation-based Argument Mining?2024
- Can Large Language Models Really Improve by Self-critiquing Their Own Plans?2023
- Can Large Language Models Reason and Optimize Under Constraints?2026
- Can Large Language Models Reason and Plan?2024
- Can Large Language Models Transform Computational Social Science?
- Can Large Language Models Understand Argument Schemes?
- Can Large Language Models Understand Context?2024
- Can Large Reasoning Models Self-Train?2025
- Can LLM be a Personalized Judge?2024
- Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation2024
- Can LLMs Follow Simple Rules?
- Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers2024
- Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions2025
- Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?2024
- Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?2024
- Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games2024
- Can robots do therapy?: Examining the efficacy of a CBT bot in comparison with other behavioral intervention technologies in alleviating mental health symptoms
- Can Theoretical Physics Research Benefit from Language Agents?2025
- Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning2025
- Can You Trust LLM Judgments? Reliability of LLM-as-a-Judge2024
- CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues2024
- Canvil: Designerly Adaptation for LLM-Powered User Experiences2024
- Capturing Individual Human Preferences with Reward Features2025
- Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models2025
- Causal Claims in Economics2025
- Causal Reflection with Language Models2025
- Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning2025
- CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning2025
- CEO: Corpus-based Open-Domain Event Ontology Induction2023
- Chain of Draft: Thinking Faster by Writing Less2025
- Chain of Stance: Stance Detection with Large Language Models2024
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
- Chain of Thoughtlessness? An Analysis of CoT in Planning2024
- Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models2023
- Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering2023
- Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective2025
- Chain-of-Retrieval Augmented Generation2025
- Chain-of-Thought Is Not Explainability
- Chain-of-thought Reasoning Is A Policy Improvement Operator2023
- Chain-of-Thought Reasoning Without Prompting2024
- Chain-of-Verification Reduces Hallucination in Large Language Models2023
- Challenges of Large Language Models for Mental Health Counseling2023
- Chamain: Harmonizing Character Persona Integrity with Domain-Adaptive Knowledge in Dialogue Generation
- Character is Destiny: Can Role-Playing Language Agents Make Persona-Driven Decisions?2024
- Characterizing Deep Research: A Benchmark and Formal Definition2025
- Characterizing Online Discussion Using Coarse Discourse Sequences
- Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference2024
- Chatbot vs. Human: The Impact of Responsive Conversational Features on Users’ Responses to Chat Advisors
- Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems2024
- ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate2023
- ChatGPT Doesn’t Trust Chargers Fans: Guardrail Sensitivity in Context2024
- ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling2023
- ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs2025
- ChatGPT: deconstructing the debate and moving it forward
- ChatGPT: towards AI subjectivity
- Checklists Are Better Than Reward Models For Aligning Language Models2025
- Choosing the Right Weights: Balancing Value, Strategy, and Noise in Recommender Systems2023
- Circuit Tracing: Revealing Computational Graphs in Language Models
- CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning2025
- Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness2024
- Classifying YouTube Comments Based on Sentiment and Type of Sentence2021
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
- CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization2023
- CloChat: Understanding How People Customize, Interact, and Experience Personas in Large Language Models2024
- Clustering-based Sampling for Few-Shot Cross-Domain Keyphrase Extraction
- Code as Agent Harness2026
- CogBench: a large language model walks into a psychology lab2024
- Cognitive Architectures for Language Agents2023
- Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations2025
- Cognitive Effects in Large Language Models2023
- CollabLLM: From Passive Responders to Active Collaborators2025
- Collaborative Deep Learning for Recommender Systems2014
- Collaborative Filtering Bandits2015
- Collaborative Filtering for Implicit Feedback Datasets
- Collaborative Filtering with Temporal Dynamics
- Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog2025
- Collaborative Reasoner: Self-Improving Social Agents with Synthetic Conversations
- CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation2023
- Command A: An Enterprise-Ready Large Language Model2025
- Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity2025
- ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning2025
- Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews2023
- Comparing emotion feature extraction approaches for predicting depression and anxiety
- Comparing Human and AI Therapists in Behavioral Activation for Depression: Cross-Sectional Questionnaire Study
- COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling2024
- Competitive Programming with Large Reasoning Models2025
- Complex Logical Instruction Generation2025
- Complexity-Based Prompting for Multi-Step Reasoning2022
- Compositional Reasoning with Transformers, RNNs, and Chain of Thought2025
- Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning2025
- Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations2024
- Computational Modelling of Undercuts in Real-world Arguments
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
- Computer says “No”: The Case Against Empathetic Conversational AI2022
- Conceptual Design Generation Using Large Language Models2023
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models2026
- Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data2024
- CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants2023
- Considering the Context to Build Theory in HCI, HRI, and HMC: Explicating Differences in Processes of Communication and Socialization With Social Technologies
- Consistency Models Made Easy2024
- Consistency Training Helps Stop Sycophancy and Jailbreaks2025
- Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommendations2025
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning2025
- Constructing a Periodic Table of Arguments
- Content-aware Collaborative Music Recommendation Using Pre-trained Neural Networks
- Context Embeddings for Efficient Answer Generation in RAG2024
- Context Engineering 2.0: The Context of Context Engineering2025
- Context Tuning for Retrieval Augmented Generation2023
- Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning2023
- Continual Instruction Tuning for Large Multimodal Models2023
- CONTROL PREFIXES for Parameter-Efficient Text Generation2021
- Controlling Linguistic Style Aspects in Neural Language Generation2017
- Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents2024
- Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations2023
- Conversation Derailment Forecasting with Graph Convolutional Networks2023
- Conversational Alignment with Artificial Intelligence in Context2025
- Conversational DNA: A New Visual Language for Understanding Dialogue Structure in Human and AI2025
- Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
- Conversational Prompt Engineering2024
- Conversational Recommendation: A Grand AI Challenge2022
- Conversational Semantic Parsing for Dialog State Tracking2020
- Conversations Gone Awry: Detecting Early Signs of Conversational Failure
- CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
- CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate: A Theory Perspective2025
- CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks2025
- Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making2025
- Creativity Has Left the Chat: The Price of Debiasing Language Models2024
- Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying2024
- Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate2025
- Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback2025
- Critiques of World Models2025
- CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions2025
- Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains2025
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs2023
- Cultural Evolution of Cooperation among LLM Agents2024
- Cumulated Gain-Based Evaluation of IR Techniques
- Cumulative Reasoning with Large Language Models2023
- CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning2018
- Curse of “Low” Dimensionality in Recommender Systems2023
D125↑ top
- DAPIE: Interactive Step-by-Step Explanatory Dialogues to Answer Children’s Why and How Questions
- DAPO: An Open-Source LLM Reinforcement Learning System at Scale2025
- Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents2025
- DataComp-LM: In search of the next generation of training sets for language models2024
- DATATALES: Investigating the use of Large Language Models for Authoring Data-Driven Articles2023
- Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models2024
- DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations2022
- Debating with More Persuasive LLMs Leads to More Truthful Answers2024
- DecepChain: Inducing Deceptive Reasoning in Large Language Models2025
- Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning2024
- Decision Transformer: Reinforcement Learning via Sequence Modeling2021
- Decision-Oriented Dialogue for Human–AI Collaboration2023
- Decomposed Prompting: A Modular Approach for Solving Complex Tasks2022
- Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory2025
- DEEM: Dynamic Experienced Expert Modeling for Stance Detection2024
- Deep Interest Network for Click-Through Rate Prediction2017
- Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference2023
- Deep Neural Network Approach for the Dialog State Tracking Challenge
- Deep Neural Networks for YouTube Recommendations
- Deep Research: A Systematic Survey2025
- Deep Researcher with Test-Time Diffusion2025
- Deep Think with Confidence2025
- DeepAgent: A General Reasoning Agent with Scalable Toolsets2025
- DeepCT-enhanced Lexical Argument Retrieval
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL2025
- DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025
- DeepNet: Scaling Transformers to 1,000 Layers2022
- DeepRAG: Thinking to Retrieval Step by Step for Large Language Models2025
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments2025
- DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research2025
- DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning2025
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
- Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality2025
- DeLLMa: Decision Making Under Uncertainty with Large Language Models2024
- Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP2022
- Demystifying Chains, Trees, and Graphs of Thoughts2024
- Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning2025
- Dense Retrieval Adaptation using Target Domain Description2023
- DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents2023
- Design Principles for Generative AI Applications2024
- Designing AI Personalities: Enhancing Human-Agent Interaction Through Thoughtful Persona Design2024
- Detecting Cognitive Distortions from Patient-Therapist Interactions
- Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change
- Detecting hallucinations in large language models using semantic entropy
- Determinants of LLM-assisted Decision-Making2024
- Detoxify Language Model Step-by-Step2023
- Developing Effective Educational Chatbots with ChatGPT prompts: Insights from Preliminary Tests in a Case Study on Social Media Literacy2023
- Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions
- Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces2026
- Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time2025
- Diagnostic Reasoning Prompts Reveal the Potential for Large Language Model Interpretability in Medicine2023
- Dialog Inpainting: Turning Documents into Dialogs2022
- Dialoging Resonance: How Users Perceive, Reciprocate and React to Chatbot’s Self-Disclosure in Conversational Recommendations2021
- Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources2023
- Dialogue State Tracking with a Language Model using Schema-Driven Prompting
- Dialogue Transformers2019
- DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs2025
- DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications2024
- Diffusion Language Models Know the Answer Before Decoding2025
- Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing2025
- Diffusion Models are Evolutionary Algorithms2024
- Diffusion-LM Improves Controllable Text Generation2022
- Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning2023
- Direct Language Model Alignment from Online AI Feedback2024
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model2023
- Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks2025
- Disambiguating Anthropomorphism and Anthropomimesis in Human-Robot Interaction2026
- Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
- Discourse-Level Representations can Improve Prediction of Degree of Anxiety
- Discovering Latent Concepts Learned in BERT2022
- Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations
- DiscussLLM: Teaching Large Language Models When to Speak2025
- Dissociating language and thought in large language models2023
- Distilling LLMs' Decomposition Abilities into Compact Language Models2024
- Divide-or-Conquer? Which Part Should You Distill Your LLM?2024
- Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?2025
- Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models2024
- Do Language Models Understand Time?2024
- Do Large Language Models Latently Perform Multi-Hop Reasoning?2024
- Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?2024
- Do Large Language Models Reason Causally Like Us? Even Better?2025
- Do large language models resemble humans in language use?2023
- Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom2024
- Do LLMs Encode Functional Importance of Reasoning Tokens?2026
- Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses2024
- Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models2023
- Do LLMs produce texts with "human-like" lexical diversity?2025
- Do LLMs Truly Understand When a Precedent Is Overruled?2025
- Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations2023
- Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning2023
- Do Phone-Use Agents Respect Your Privacy?2026
- Do Prompt-Based Models Really Understand the Meaning of Their Prompts?2021
- Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection2020
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust2025
- Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?2025
- DO THEY SEE WHAT WE SEE?
- Do We Trust ChatGPT as much as Google Search and Wikipedia?
- DOC: Improving Long Story Coherence With Detailed Outline Control2022
- DocLLM: A layout-aware generative language model for multimodal document understanding2023
- Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?2024
- Does It Make Sense to Speak of Introspection in Large Language Models?2025
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?2025
- Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook2026
- Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models2025
- Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search2024
- Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey2023
- Domain-specific Question Answering with Hybrid Search2024
- Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?2025
- Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration2024
- DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration2025
- DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research2025
- DR-HAI: Argumentation-based Dialectical Reconciliation in Human-AI Interactions2023
- DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models2024
- Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures2026
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought2024
- Durably reducing conspiracy beliefs through dialogues with AI
- DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning2026
- Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization2023
- Dynamic Planning with a LLM
- Dynamic Prompting: A Unified Framework for Prompt Tuning2023
- Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models2024
- Dynamic Task-Oriented Dialogue: A Comparative Study of Llama-2 and Bert in Slot Value Generation
- Dynamically Expandable Graph Convolution for Streaming Recommendation2023
- DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation2025
E89↑ top
- Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining2025
- Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions
- Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge2024
- Efficient Nearest Neighbor Language Models2021
- Efficient Reasoning with Balanced Thinking2026
- Efficient Reasoning with Hidden Thinking2025
- Efficient Reinforcement Learning via Large Language Model-based Search2024
- Efficient Streaming Language Models with Attention Sinks2023
- Efficient Tool Use with Chain-of-Abstraction Reasoning2024
- Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs2024
- Eliciting Latent Knowledge from Quirky Language Models2023
- Eliciting Reasoning in Language Models with Cognitive Tools2025
- Embarrassingly Shallow Autoencoders for Sparse Data*2019
- Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation2025
- Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers2024
- Emergent Hierarchical Reasoning In LLMs Through Reinforcement Learning
- Emergent Introspective Awareness in Large Language Models
- Emerging Properties in Unified Multimodal Pretraining2025
- EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus2023
- Empathetic Persuasion: Reinforcing Empathy and Persuasiveness in Dialogue Systems
- Empathy Through Multimodality in Conversational Interfaces2024
- Empirical Study of Symmetrical Reasoning in Conversational Chatbots2024
- Empowering Domain-Specific Language Models with Graph-Oriented Databases: A Paradigm Shift in Performance and Model Maintenance2024
- Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting2023
- Enabling Explainable Recommendation in E-commerce with LLM-powered Product Knowledge Graph2024
- Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance2025
- Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate2023
- End-to-End Test-Time Training for Long Context2025
- Energy-Based Transformers are Scalable Learners and Thinkers2025
- Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's Advocate
- Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation2023
- Enhancing Large Language Model Induced Task-Oriented Dialogue Systems Through Look-Forward Motivated Goals2023
- Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision2024
- Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System2023
- Enhancing personalized multi-turn dialogue with curiosity reward2025
- Enhancing Pipeline-Based Conversational Agents with Large Language Model2023
- Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy2023
- Enhancing social cohesion with cooperative bots in societies of greedy, mobile individuals2024
- Enhancing user experience in large language models through human-centered design: Integrating theoretical insights with an experimental study to meet diverse software learning needs with a single document knowledge base2024
- Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models2025
- Equipping agents for the real world with Agent Skills
- Escaping the Verifier: Learning to Reason via Demonstrations2025
- Estimating AI productivity gains from Claude conversations
- Evaluating Emotional Nuances In Dialogue Summarization2023
- Evaluating Large Language Models at Evaluating Instruction Following2023
- Evaluating Large Language Models in Exercises of UML Class Diagram Modeling
- Evaluating Large Language Models in Theory of Mind Tasks2023
- Evaluating the Diversity and Quality of LLM Generated Content2025
- Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling2023
- Evaluating the False Trust Engendered by LLM Explanations2026
- Evaluating the psychometric properties of ChatGPT-generated questions
- Evaluating the Therapeutic Alliance With a Free-Text CBT Conversational Agent (Wysa): A Mixed-Methods Study
- Evaluating Theory of Mind and Internal Beliefs in LLM-Based Multi-Agent Systems2026
- Evaluating Very Long-Term Conversational Memory of LLM Agents2024
- Evaluation and Benchmarking of LLM Agents: A Survey2025
- Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading2025
- Everything Everywhere All At Once: Llms Can In-context Learn Multiple Tasks In Superposition2024
- Evidence of Human-Level Bonds Established With a Digital Conversational Agent: Cross-sectional, Retrospective Observational Study
- Evidence-centered Assessment for Writing with Generative AI2024
- EVINCE: Optimizing Multi-LLM Dialogues Using Conditional Statistics and Information Theory2024
- Evolving Deeper LLM Thinking2025
- Existential Conversations with Large Language Models: Content, Community, and Culture2024
- Expanding Explainability: Towards Social Transparency in AI systems
- Expedient Assistance and Consequential Misunderstanding: Envisioning an Operationalized Mutual Theory of Mind
- Experimental Design for Active Transductive Inference in Large Language Models2024
- Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy2025
- Explainable Compliance Detection with Multi-Hop Natural Language Inference on Assurance Case Structure2025
- Explainable Multimodal Emotion Reasoning2023
- Explainable Recommendation with Personalized Review Retrieval and Aspect Learning2023
- Explainable Recommendations via Attentive Multi-Persona Collaborative Filtering2020
- Explicit Inductive Inference using Large Language Models2024
- Exploiting Dialogue Acts and Context to Identify Argumentative Relations in Online Debates
- Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models2023
- Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks2025
- Exploring Format Consistency for Instruction Tuning2023
- Exploring Large Language Models for Knowledge Graph Completion2023
- Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Approaches
- Exploring Student-AI Interactions in Vibe Coding2025
- Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review2024
- Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
- Exploring the Potential of ChatGPT on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations
- Exploring the Potential of Large Language Models in Computational Argumentation2023
- Exploring the Role of Prior Beliefs for Argument Persuasion2019
- Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers2025
- External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling
- Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering2026
- Extracting memorized pieces of (copyrighted) books from open-weight language models2025
- Extrapolation by Association: Length Generalization Transfer in Transformers2025
- Extreme Multi-Label Skill Extraction Training using Large Language Models2023
F62↑ top
- Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation2024
- Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model
- Faith and Fate: Limits of Transformers on Compositionality2023
- Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations2025
- Fake News Detectors are Biased against Texts Generated by Large Language Models2023
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs2026
- Fast and Slow Learning From Reviews
- Fast, Slow, and Tool-augmented Thinking for LLMs: A Review2025
- Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI2025
- Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models2025
- FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning2025
- Find the Gap: AI, Responsible Agency and Vulnerability
- Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences2025
- Fine-grained Hallucination Detection and Editing for Language Models2024
- Fine-tuning Language Models for Factuality2023
- Fine-tuning Large Language Model for Automated Algorithm Design2025
- Fine-tuning Pre-trained Language Models for Dialogical Argument Mining with Inference Anchoring Theory
- First Try Matters: Revisiting the Role of Reflection in Reasoning Models2025
- FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets2023
- Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models2025
- Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities2024
- FlowMind: Automatic Workflow Generation with LLMs2024
- FlowReasoner: Reinforcing Query-Level Meta-Agents2025
- Flows: Building Blocks of Reasoning and Collaborating AI2023
- FLOWSTEER: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems2026
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions2024
- Forecasting the presence and intensity of hostility on Instagram using linguistic and social features2018
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning2024
- FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming2025
- Foundation Priors2025
- Foundation Protocol: A Coordination Layer for Agentic Society2026
- Foundations of Large Language Models2025
- From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications2025
- From Context to Skills: Can Language Models Learn from Context Skillfully?2026
- From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models2024
- From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence2026
- From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step2024
- From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers2025
- From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks2024
- From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities2024
- From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Models2025
- From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization2023
- From Language to Logic: A Bi-Level Framework for Structured Reasoning2025
- From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models2024
- From Local to Global: A Graph RAG Approach to Query-Focused Summarization2024
- From Louvain to Leiden: guaranteeing well-connected communities2018
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AI2026
- From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?2025
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation2025
- From Prompt Engineering to Prompt Science With Human in the Loop2024
- From Simulation to Enaction: Post-trained Language Models Recognize and React to their own Generations2026
- From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting2023
- From speaking like a person to being personal: The effects of personalized, regular interactions with conversational agents
- From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs2024
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning2025
- From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR2025
- From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents2025
- Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report2025
- Fundamentals of Building Autonomous LLM Agents2025
- Further Explorations on the Use of Large Language Models for Thematic Analysis. Open-Ended Prompts, Better Terminologies and Thematic Maps
- Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce2025
- FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction2025
G40↑ top
- Game-theoretic LLM: Agent Workflow for Negotiation Games2024
- Gdpval: Evaluating Ai Model Performance On Real-world Economically Valuable Tasks
- Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities2025
- Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini2026
- GenAI as a Power Persuader: How Professionals Get Persuasion Bombed When They Attempt to Validate LLMs
- Generalization Bias in Large Language Model Summarization of Scientific Research2025
- Generalization through Memorization: Nearest Neighbor Language Models2019
- Generalization to New Sequential Decision Making Tasks with In-Context Learning2023
- Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy2025
- Generating Query-Relevant Document Summaries via Reinforcement Learning2025
- Generative Agent Simulations of 1,000 People2024
- Generative Agents: Interactive Simulacra of Human Behavior2023
- Generative AI in Real-World Workplaces
- Generative Interfaces for Language Models2025
- Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?2023
- Generative Recursive Reasoning2026
- Generator-Retriever-Generator: A Novel Approach to Open-domain Question Answering2023
- GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning2025
- GenRec: Large Language Model for Generative Recommendation2023
- GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency2024
- GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning2025
- GHRS: Graph-based Hybrid Recommendation System with Application to Movie Recommendation2021
- Goal Alignment in LLM-Based User Simulators for Conversational AI2025
- Goals, Plans, and Action Models
- Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations2023
- GPT-4 as a Homework Tutor can Improve Student Engagement and Learning Outcomes2024
- GPT-4 is judged more human than humans in displaced and inverted Turing tests2024
- Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development2025
- Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks2024
- Graph of Thoughts: Solving Elaborate Problems with Large Language Models2023
- Graph-enhanced Large Language Models in Asynchronous Plan Reasoning2024
- GRASP: Municipal Budget AI Chatbots for Enhancing Civic Engagement2025
- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization2024
- Grounding Gaps in Language Model Generations2023
- Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning2023
- Grounding Multilingual Multimodal LLMs With Cultural Knowledge2025
- Grounding ‘Grounding’ in NLP2021
- GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models2024
- Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models2024
- Guiding Large Language Models via Directional Stimulus Prompting2023
H46↑ top
- H2HTalk: Evaluating Large Language Models as Emotional Companion2025
- Hallucinating with AI: AI Psychosis as Distributed Delusions2025
- Hallucination is Inevitable: An Innate Limitation of Large Language Models2024
- Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools2024
- Hallucinations Undermine Trust; Metacognition is a Way Forward2026
- Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents2026
- Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses2026
- Harnessing Business and Media Insights with Large Language Models2024
- Has the Creativity of Large-Language Models peaked? —an analysis of inter- and intra-LLM variability —2025
- Hello Again! LLM-powered Personalized Agent for Long-term Dialogue2024
- Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence2026
- Hierarchical Reasoning Model2025
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches2025
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models2024
- HiTKG: Towards Goal-Oriented Conversations via Multi-Hierarchy Learning
- Hogwild! Inference: Parallel LLM Generation via Concurrent Attention2025
- Holy Grail 2.0: From Natural Language to Constraint Models2023
- HonestBait: Forward References for Attractive but Faithful Headline Generation2023
- Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis2025
- How AI Impacts Skill Formation2026
- How do Transformers Learn Implicit Reasoning?2025
- How Exposed Are UK Jobs to Generative AI? Developing and Applying a Novel Task-Based Index2025
- How Far Are We from Genuinely Useful Deep Research Agents?2025
- How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs2024
- How Many Instructions Can LLMs Follow at Once?2025
- How much do language models memorize?2025
- How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding2025
- How new data permeates LLM knowledge and how to dilute it2025
- How Should We Meta-Learn Reinforcement Learning Algorithms?2025
- How susceptible are LLMs to Logical Fallacies?2023
- How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework2026
- How to Correctly do Semantic Backpropagation on Language-based Agentic Systems2024
- How we built our multi-agent research system
- How well can large language models explain business processes?2024
- HowProjective is Projective Content? Gradience in Projectivity and At-issueness
- HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs2024
- Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks2024
- Humanity's Last Exam2025
- Humans learn to prefer trustworthy AI over human partners2025
- Humans or LLMs as the Judge? A Study on Judgement Biases2024
- Humans overrely on overconfident language models, across languages2025
- Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning
- Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing2024
- Hyperagents2026
- HyperBandit: Contextual Bandit with Hypernetwork for Time-Varying User Preferences in Streaming Recommendation2023
- Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models2025
I61↑ top
- I like it... I like it not: Evaluating User Ratings Noise in Recommender Systems
- Identification of Propositional and Illocutionary Relations
- IFEvalCode: Controlled Code Generation2025
- IMBUE: Improving Interpersonal Effectiveness through Simulation and Just-in-time Feedback with Human-Language Model Interaction
- Implicit Chain of Thought Reasoning via Knowledge Distillation2023
- Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions2025
- Improving Conversational Recommender Systems via Transformer-based Sequential Modelling
- Improving Dialog Systems for Negotiation with Personality Modeling2020
- Improving Document-Level Sentiment Analysis with User and Product Context2020
- Improving Factuality and Reasoning in Language Models through Multiagent Debate2023
- Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans2023
- Improving large language models with concept-aware fine-tuning2025
- Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards2024
- Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks2024
- In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss2024
- In-context learning agents are asymmetric belief updaters2024
- In-Context Principle Learning from Mistakes2024
- Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems2024
- Inducing Positive Perspectives with Text Reframing2022
- Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs2024
- Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models2025
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model2023
- Inference-Time Scaling for Generalist Reward Modeling2025
- Information-Theoretic Reward Decomposition for Generalizable RLHF2025
- Informed Named Entity Recognition Decoding For Generative Language Models2023
- Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey2025
- InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles2025
- Insert-expansions For Tool-enabled Conversational Agents2023
- Inspecting and Editing Knowledge Representations in Language Models2023
- INSPIRED: Toward Sociable Recommendation Dialog Systems2020
- Instance-adaptive Zero-shot Chain-of-Thought Prompting2024
- Instruction Induction: From Few Examples to Natural Language Task Descriptions2022
- Instruction Tuning for Large Language Models: A Survey2023
- Instruction-tuned Language Models are Better Knowledge Learners2024
- Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning2024
- IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems2025
- Intelligent AI Delegation2026
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation2026
- Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues2023
- Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy2022
- Interaction Dynamics as a Reward Signal for LLMs2025
- Interactions with generative AI chatbots: unveiling dialogic dynamics, students’ perceptions, and practical competencies in creative problem-solving
- Interactive Evaluation Requires a Design Science2026
- Interesting Scientific Idea Generation Using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders
- Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation2025
- Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments2023
- interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification2026
- Intrinsic Credit Assignment for Long Horizon Interaction2026
- Intrinsically Motivated Graph Exploration Using Network Theories of Human Curiosity2023
- InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models2023
- Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting2023
- Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data2024
- Investigating Gender Bias in Language Models Using Causal Mediation Analysis
- Investigating task-specific prompts and sparse autoencoders for activation monitoring2025
- Irony in Emojis: A Comparative Study of Human and LLM Interpretation2025
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens2025
- Is Cosine-Similarity of Embeddings Really About Similarity?2024
- Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?2024
- Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs2024
- It's About Time: Incorporating Temporality in Retrieval Augmented Language Models2024
- It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization2025
J5↑ top
- J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning2025
- Jamba: A Hybrid Transformer-Mamba Language Model2024
- JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering2021
- Jointly Reinforcing Diversity and Quality in Language Model Generations2025
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena2023
K13↑ top
- KAN: Kolmogorov-Arnold Networks2024
- KellyBench: Can Language Models Beat the Market?
- KETOD: Knowledge-Enriched Task-Oriented Dialogue2022
- KGAT: Knowledge Graph Attention Network for Recommendation2019
- KiPT: Knowledge-injected Prompt Tuning for Event Detection
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization2025
- Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models2025
- Knowledge Graph Prompting for Multi-Document Question Answering2023
- Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains2025
- Knowledge Retrieval Based on Generative AI2025
- Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations2023
- KoLA: Carefully Benchmarking World Knowledge of Large Language Models2023
- KTO: Model Alignment as Prospect Theoretic Optimization2024
L157↑ top
- Language Agents as Optimizable Graphs2024
- Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration2020
- Language Model Personalization via Reward Factorization2025
- Language Modeling by Language Models2025
- Language Modeling is Compression2023
- Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought2022
- Language Models are Pragmatic Speakers2023
- Language models are weak learners2023
- Language Models Learn to Mislead Humans via RLHF2024
- Language Models Need Sleep2026
- Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories2026
- Language models show human-like content effects on reasoning tasks2022
- Language Models’ Hall of Mirrors Problem: Why AI Alignment Requires Peircean Semiosis
- Large Action Models: From Inception to Implementation2024
- Large Causal Models From Large Language Models2025
- Large Concept Models: Language Modeling in a Sentence Representation Space
- Large Language Diffusion Models2025
- Large Language Model Agents Are Not Always Faithful Self-Evolvers2026
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges2024
- Large Language Model Guided Tree-of-Thought2023
- Large Language Model Programs2023
- Large Language Model Reasoning Failures2026
- Large Language Model-based Data Science Agent: A Survey2025
- Large Language Model-Brained GUI Agents: A Survey2024
- Large Language Models and Knowledge Graphs: Opportunities and Challenges2023
- Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments2024
- Large Language Models Are Human-level Prompt Engineers2022
- Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners2023
- Large Language Models are Zero-Shot Rankers for Recommender Systems2023
- Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning2024
- Large Language Models as Conversational Movie Recommenders: A User Study2024
- Large Language Models as Planning Domain Generators2024
- Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?*2023
- Large Language Models as Zero-Shot Conversational Recommenders2023
- Large Language Models can accomplish Business Process Management Tasks2023
- Large Language Models Can Infer Psychological Dispositions of Social Media Users2023
- Large language models can segment narrative events similarly to humans2023
- Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
- Large Language Models Do Not Simulate Human Psychology2025
- Large Language Models For Social Networks: Applications, Challenges, And Solutions
- Large Language Models for User Interest Journeys2023
- Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search2023
- Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities2025
- Large Language Models Reflect the Ideology of their Creators2024
- Large Language Models Report Subjective Experience Under Self-Referential Processing2025
- Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions2023
- Large language models surpass human experts in predicting neuroscience results2024
- Large Language Models Think Too Fast To Explore Effectively2025
- Large Language Models: A Survey2024
- Large Linguistic Models: Investigating LLMs' metalinguistic abilities2023
- Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency2024
- Large Multimodal Agents: A Survey2024
- Large Scale Product Graph Construction for Recommendation in E-commerce2020
- Latent Collaboration in Multi-Agent Systems2025
- Latent Skill Discovery for Chain-of-Thought Reasoning2023
- LatentQA: Teaching LLMs to Decode Activations Into Natural Language2024
- Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers2025
- Learn from your own latents and not from tokens: A sample-complexity theory2026
- Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments2025
- Learning "Partner-Aware" Collaborators in Multi-Party Collaboration2025
- Learning Agent-Compatible Context Management for Long-Horizon Tasks2026
- Learning Distributed Representations from Reviews for Collaborative Filtering2018
- Learning Human-Object Interaction as Groups2025
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries2025
- Learning Retrieval Augmentation for Personalized Dialogue Generation2024
- Learning to (Learn at Test Time): RNNs with Expressive Hidden States2024
- Learning to Ask Appropriate Questions in Conversational Recommendation2021
- Learning to Ask Critical Questions for Assisting Product Search2024
- Learning to Discover at Test Time2026
- Learning To Guide Human Experts Via Personalized Large Language Models2023
- Learning to Learn from Language Feedback with Social Meta-Learning2026
- Learning to Map Context-Dependent Sentences to Executable Formal Queries2018
- Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge2025
- Learning to Rank for Recommender Systems
- Learning to Reason for Factuality2025
- Learning to Reason without External Rewards2025
- Learning to Relate to Previous Turns in Conversational Search2023
- Learning To Retrieve Prompts for In-Context Learning2021
- Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering2019
- Learning to Select the Relevant History Turns in Conversational Question Answering2023
- Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs2025
- Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders2022
- Learning, Fast and Slow: Towards LLMs That Adapt Continually2026
- Least-to-most Prompting Enables Complex Reasoning In Large Language Models2022
- Less is More: Recursive Reasoning with Tiny Networks2025
- LESS: Selecting Influential Data for Targeted Instruction Tuning2024
- Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System
- Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models2024
- Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones2025
- Let’s Verify Step by Step2023
- Levels of AI Agents: from Rules to Large Language Models2024
- Levels of Analysis for Large Language Models2025
- Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity2022
- Leveraging Few-Shot Data Augmentation and Waterfall Prompting for Response Generation2023
- Leveraging Large Language Models in Conversational Recommender Systems2023
- Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset2023
- Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning2023
- LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels2026
- Lexical Entrainment for Conversational Systems2023
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models2024
- Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways2023
- LIMA: Less Is More for Alignment2023
- LIMI: Less is More for Agency2025
- LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling2025
- Linguistic Alignment in Conversational AI: A Systematic Review of Cognitive-Linguistic Dimensions, Measurements, and User Outcomes (2020–2025)
- Linguistic Blind Spots of Large Language Models2025
- Linguistic Calibration of Long-Form Generations2024
- Linguistic markers of inherently false AI communication and intentionally false human communication: Evidence from hotel reviews
- LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries2025
- LLaMA-Omni: Seamless Speech Interaction with Large Language Models2024
- LLM Augmentations to support Analytical Reasoning over Multiple Documents2024
- LLM Generated Persona is a Promise with a Catch2025
- LLM Post-Training: A Deep Dive into Reasoning Large Language Models2025
- LLM Reasoning Is Latent, Not the Chain of Thought2026
- LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory2025
- LLM+P: Empowering Large Language Models with Optimal Planning Proficiency2023
- LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices2024
- LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback2024
- LLM-Independent Adaptive RAG: Let the Question Speak for Itself2025
- LLM-Rec: Personalized Recommendation via Prompting Large Language Models2023
- LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders2024
- LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization2023
- LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools2024
- LLMorphism: When humans come to see themselves as language models
- LLMs are Frequency Pattern Learners in Natural Language Inference2025
- LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities2025
- LLMs as Architects and Critics for Multi-Source Opinion Summarization2025
- LLMs as Method Actors: A Model for Prompt Engineering and Architecture2024
- LLMs can be Fooled into Labelling a Document as Relevant
- LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring2025
- LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!2025
- LLMs can implicitly learn from mistakes in-context2025
- LLMs Corrupt Your Documents When You Delegate2026
- LLMs Get Lost In Multi-Turn Conversation2025
- LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings2025
- LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High2025
- Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains2025
- Localizing Paragraph Memorization in Language Models2024
- Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning2023
- Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models2024
- Logical Reasoning in Large Language Models: A Survey2025
- LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models2024
- Long-context LLMs Struggle with Long In-context Learning2024
- Long-form Factuality In Large Language models2024
- Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning2025
- LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering2024
- LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs2024
- LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards2026
- Look Before You Leap: Autonomous Exploration for LLM Agents2026
- Looking beyond the next token2025
- Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers2026
- Looped Diffusion Language Models2026
- Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models2024
- Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs2024
- LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems2025
- LSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following2025
- Lumiere: A Space-Time Diffusion Model for Video Generation2024
M90↑ top
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models2025
- Machine ex machina: A Framework Decentering the Human in AI Design Praxis
- Machine gaze in online behavioral targeting: The effects of algorithmic human likeness on social presence and social influence
- Machine Psychology2023
- Magentic-UI: Towards Human-in-the-loop Agentic Systems2025
- Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing2024
- Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning2024
- Making Sense of Memory in AI Agents
- Man vs machine – Detecting deception in online reviews
- MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving2025
- MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization2025
- MasRouter: Learning to Route LLMs for Multi-Agent Systems2025
- Massive Activations in Large Language Models2024
- Mastering Diverse Domains through World Models2023
- MatFormer: Nested Transformer for Elastic Inference2023
- Mathematical methods and human thought in the age of AI2026
- MaxMin-RLHF: Alignment with Diverse Human Preferences2024
- MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch2025
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs2024
- Meanings are like Onions: a Layered Approach to Metaphor Processing2025
- Measuring Agents in Production2025
- Measuring Alliance and Symptom Severity in Psychotherapy Transcripts Using Bert Topic Modeling
- Measuring and Mitigating Persona Distortions from AI Writing Assistance2026
- Measuring Faithfulness in Chain-of-Thought Reasoning2023
- Measuring Human Preferences in RLHF is a Social Science Problem2026
- Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models2025
- Measuring the Value of Social Dynamics in Online Product Ratings Forums
- Mechanisms of Introspective Awareness2026
- Mechanistic Indicators of Understanding in Large Language Models2025
- Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs2026
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications2025
- Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads2024
- MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation2023
- Memorization and Knowledge Injection in Gated LLMs2025
- Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models2025
- Memory in the Age of AI Agents: A Survey — Forms, Functions and Dynamics2025
- Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents2023
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models2025
- Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge2024
- MetaClaw: Just Talk — An Agent That Meta-Learns and Evolves in the Wild2026
- Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving2024
- Metacognitive Prompting Improves Understanding in Large Language Models2023
- Metacognitive Retrieval-Augmented Large Language Models2024
- Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors2025
- Metadiscursive nouns in academic argument: ChatGPT vs student practices
- Metagpt: Meta Programming For Multi-agent Collaborative Framework2023
- MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems2025
- Methodologies for Improving Modern Industrial Recommender Systems2023
- Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models2024
- Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse2024
- Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)2025
- Minds versus Machines: Rethinking Entailment Verification with Language Models2024
- MindSearch: Mimicking Human Minds Elicits Deep AI Searcher2024
- Mindstorms in Natural Language-Based Societies of Mind2023
- MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention2025
- Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning2025
- MIO: A Foundation Model on Multimodal Tokens2024
- Misaligned by Design: Incentive Failures in Machine Learning2025
- Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?2025
- Mitigating Hallucinations in Large Language Models via Causal Reasoning2025
- Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say2025
- Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models2023
- MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement2025
- MLLM-CBench: A Comprehensive Benchmark for Continual Instruction Tuning of Multimodal LLMs with Chain-of-Thought Reasoning Analysis2025
- MM-LLMs: Recent Advances in MultiModal Large Language Models2024
- MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases2024
- Model Organisms for Emergent Misalignment2025
- Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence2024
- Modeling Appropriate Language in Argumentation
- Modeling Code: Is Text All You Need?2025
- Modeling Interpersonal Linguistic Coordination in Conversations using Word Mover's Distance2019
- Modeling the Quality of Dialogical Explanations2024
- MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections2025
- MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind2025
- Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation2025
- Monolith: Real Time Recommendation System With Collisionless Embedding Table2022
- MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis2025
- Mostly Exploration-Free Algorithms for Contextual Bandits2017
- Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning2025
- Multi-agent cooperation through in-context co-player inference2026
- Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?2026
- Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation2025
- Multi-hop Question Answering via Reasoning Chains2019
- Multi-Task End-to-End Training Improves Conversational Recommendation2023
- Multi-Token Attention2025
- Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains2025
- MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs2025
- MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries2024
- Multistep Consistency Models2024
- MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation2026
N29↑ top
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention2025
- Natural Emergent Misalignment From Reward Hacking In Production RL
- Natural Emergent Misalignment From Reward Hacking In Production Rl2025
- NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions2025
- Navigating the Latent Space Dynamics of Neural Models2025
- Navigating the State of Cognitive Flow: Context-Aware AI Interventions for Effective Reasoning Support2025
- Nested Attention: Semantic-aware Attention Values for Concept Personalization2025
- Nested Learning: The Illusion of Deep Learning Architecture Expanded
- Nested Learning: The Illusion of Deep Learning Architectures
- Nested Learning: The Illusion of Deep Learning Architectures2025
- Neural Approaches to Conversational AI2018
- Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning2019
- Neural Collaborative Filtering2017
- Neural Collaborative Filtering vs. Matrix Factorization Revisited2020
- Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes2023
- Neural Topic Modeling of Psychotherapy Sessions2022
- Neuro-Symbolic AI in 2024: A Systematic Review2025
- NeuroQL: A Neuro-Symbolic Language and Dataset for Inter-Subjective Reasoning2023
- Neurosymbolic AI- Why, What, and How2023
- Neutralizing Bias in LLM Reasoning using Entailment Graphs2025
- News Sentiment Embeddings for Stock Price Forecasting2025
- News Source Citing Patterns in AI Search Systems2025
- Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction2025
- Next Steps for Human-Centered Generative AI: A Technical Perspective2023
- Nexus: An Agentic Framework for Time Series Forecasting2026
- No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance2024
- No that's not what I meant: Handling Third Position Repair in Conversational Question Answering2023
- Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance2025
- NoveltyBench: Evaluating Language Models for Humanlike Diversity2025
O44↑ top
- O1 Replication Journey: A Strategic Progress Report -- Part 12024
- Octopus v2: On-device language model for super agent2024
- Octopus v4: Graph of language models2024
- Off-Policy Evaluation for Large Action Spaces via Policy Convolution
- OMNI-SIMPLEMEM: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory2026
- Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling2025
- OmniParser for Pure Vision Based GUI Agent2024
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking2025
- On Generative Agents in Recommendation2023
- On Information Distortions in Online Ratings
- On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents2026
- On Predictive planning and counterfactual learning in active inference2024
- On the Adaptive Psychological Persuasion of Large Language Models2025
- On the Binding Problem in Artificial Neural Networks2020
- On the Conversational Basis of Some Presuppositions
- On the Impact of Fine-Tuning on Chain-of-Thought Reasoning2024
- On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models2025
- On the Limits of Innate Planning in Large Language Models2025
- On The Persona-based Summarization of Domain-Specific Documents2024
- On the Reasoning Capacity of AI Models and How to Quantify It2025
- On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models
- On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs
- On the Societal Impact of Open Foundation Models2024
- On the Theoretical Limitations of Embedding-Based Retrieval2025
- On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting2025
- Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback2024
- Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models2024
- Open Problems in Mechanistic Interpretability2025
- Open-World Evaluations for Measuring Frontier AI Capabilities2026
- Openagents: An Open Platform For Language Agents In The Wild2023
- OpenAssistant Conversations - Democratizing Large Language Model Alignment2023
- OpenClaw-RL: Train Any Agent Simply by Talking2026
- OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning2024
- OpenThoughts: Data Recipes for Reasoning Models2025
- Operating Multi-Client Influence Networks Across Platforms
- OpinionConv: Conversational Product Search with Grounded Opinions2023
- Opportunities for large language models and discourse in engineering design
- OptimalThinkingBench: Evaluating Over and Underthinking in LLMs2025
- Optimizing Encoder-Only Transformers for Session-Based Recommendation Systems2024
- Orchestrating Synthetic Data with Reasoning
- Outcome-based Exploration for LLM Reasoning2025
- Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution2025
- Overview of DialAM-2024: Argument Mining in Natural Language Dialogues
P102↑ top
- PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing2026
- Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning2025
- PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals2024
- Payrolls to Prompts: Firm-Level Evidence on the Substitution of Labor for AI2026
- Peer-Preservation in Frontier Models
- PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods2024
- People cannot distinguish GPT-4 from a human in a Turing test2024
- Performative Thinking? The Brittle Correlation Between CoT Length and Problem Complexity2025
- Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study2026
- Persistent Pre-Training Poisoning of LLMs2024
- PersLLM: A Personified Training Approach for Large Language Models2024
- Persona Generators: Generating Diverse Synthetic Personas at Scale2026
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models2025
- Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning2025
- PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time2025
- PersonaGym: Evaluating Persona Agents and LLMs2024
- Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback2023
- Personalization of Large Language Models: A Survey2024
- Personalized Dialogue Generation with Persona-Adaptive Attention2022
- Personalized Language Modeling from Personalized Human Feedback2024
- Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning2024
- PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer2023
- Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online2024
- Persuasive presuppositions
- PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues2025
- Perturbation CheckLists for Evaluating NLG Evaluation Metrics2021
- Pixel-Level Reasoning Segmentation via Multi-turn Conversations2025
- Pixels, Patterns, but No Poetry: To See The World like Humans2025
- Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts2023
- Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o12024
- Planning Like Human: A Dual-process Framework for Dialogue Planning2024
- PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers2024
- Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs2025
- Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents2023
- Polanyi’s Revenge and AI’s New Romance with Tacit Knowledge
- PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking2019
- POMDP-based Statistical Spoken Dialogue Systems: a Review
- Position: Categorical Deep Learning is an Algebraic Theory of All Architectures2024
- Position: LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks2024
- Position: Towards Bidirectional Human-AI Alignment2024
- Post-Completion Learning for Language Models2025
- Post-training for Efficient Communication via Convention Formation2025
- Post-Training Large Language Models via Reinforcement Learning from Self-Feedback2025
- Post-training makes large language models less human-like2026
- PosterMate: Audience-driven Collaborative Persona Agents for Poster Design2025
- Posting versus Lurking: Communicating in a Multiple Audience Context
- Potemkin Understanding in Large Language Models2025
- Pragmatic Implicature Processing in ChatGPT
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing2021
- Pre-Trained Policy Discriminators are General Reward Models2025
- Precise Zero-Shot Dense Retrieval without Relevance Labels2022
- Predictive Preference Learning from Human Interventions2025
- Preference Discerning with LLM-Enhanced Generative Retrieval2024
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts2025
- Premise Order Matters in Reasoning with Large Language Models2024
- Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs2025
- Presuppositions are more persuasive than assertions if addressees accommodate them: Experimental evidence for philosophical reasoning
- Pretrained Language Models as Containers of the Discursive Knowledge
- PretrainZero: Reinforcement Active Pretraining2025
- PRewrite: Prompt Rewriting with Reinforcement Learning2024
- PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes2025
- Pro-Active Systems and Influenceable Users: Simulating Pro-Activity in Task-oriented Dialogues
- Proactive behavior in voice assistants: A systematic review and conceptual model
- Proactive Conversational Agents in the Post-ChatGPT World
- Proactive Conversational Agents with Inner Thoughts2024
- Proactive Human-Machine Conversation with Explicit Conversation Goals2019
- Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support2022
- ProAgent: Building Proactive Cooperative Agents with Large Language Models2023
- Probing Structured Semantics Understanding and Generation of Language Models via Question Answering2024
- Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games2023
- Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words2022
- Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models2024
- Process Reward Models That Think2025
- Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks2022
- Progress Measures For Grokking Via Mechanistic Interpretability2023
- Progressive-Hint Prompting Improves Reasoning in Large Language Models2023
- Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem2026
- Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm2021
- Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution2023
- Prompted LLMs as Chatbot Modules for Long Open-domain Conversation2023
- Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration2023
- Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis2024
- Prompting Large Language Models With the Socratic Method2023
- Prompting Science Report 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy2025
- Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?2024
- Propositional Interpretability in Artificial Intelligence2025
- ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models2025
- ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs2024
- ProsocialDialog: A Prosocial Backbone for Conversational Agents2022
- ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs2025
- Provable Benefits of In-Tool Learning for Large Language Models2025
- Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience2024
- PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health2024
- Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning2025
- Psychological, Relational, and Emotional Effects of Self-Disclosure After Conversations With a Chatbot2024
- Psychological, Relational, and Emotional Effects of Self-Disclosure After Conversations With a Chatbot
- Psychologically Enhanced AI Agents2025
- Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics2023
- PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling2024
- Pushdown Layers: Encoding Recursive Structure in Transformer Language Models2023
- Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability2021
Q10↑ top
- QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration2025
- Quantifying Controversy on Social Media2015
- Quantifying Human-AI Synergy
- Quantitative Introspection in Language Models: Tracking Internal States Across Conversation2026
- Query Rewriting for Retrieval-Augmented Large Language Models2023
- Query Understanding in the Age of Large Language Models2023
- QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks2026
- QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?2025
- Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis2025
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking2024
R115↑ top
- R-Zero: Self-Evolving Reasoning LLM from Zero Data2025
- R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning2025
- RAG Does Not Work for Enterprises2024
- RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation2025
- RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism2025
- Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains2025
- RARR: Researching and Revising What Language Models Say, Using Language Models2022
- Re3: Generating Longer Stories With Recursive Reprompting and Revision2022
- React - Synergizing Reasoning And Acting In Language Models2022
- Real-time News Story Identification2025
- Real-Time Procedural Learning From Experience for AI Agents2025
- Real-World Planning with PDDL+ and Beyond
- ReALM: Reference Resolution As Language Modeling2024
- ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs2025
- Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models2026
- Reasoning Can Hurt the Inductive Abilities of Large Language Models2025
- Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference2024
- Reasoning Language Models: A Blueprint2025
- Reasoning LLMs are Wandering Solution Explorers2025
- Reasoning Models Are More Easily Gaslighted Than You Think2025
- Reasoning Models Can Be Effective Without Thinking2025
- Reasoning Models Don't Always Say What They Think2025
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025
- Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks2023
- Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?2025
- Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought2026
- Reasoning to Learn from Latent Thoughts2025
- Reasoning with Large Language Models, a Survey2024
- Reasoning-Driven Synthetic Data Generation and Evaluation2026
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory2025
- ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering2025
- Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training2024
- Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning2025
- Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations2024
- RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability2023
- Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)2022
- Recommendation systems and convergence of online reviews: The type of product network matters!
- Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
- Recommender Systems with Social Regularization
- Recommending What Video to Watch Next: A Multitask Ranking System
- ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs2023
- Reconciling the accuracy-diversity trade-off in recommendations2023
- Recursive Introspection: Teaching Language Model Agents How to Self-Improve2024
- Recursive Language Models2025
- Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion2025
- Reflections and New Directions for Human-Centered Large Language Models2026
- Reflexion: Language Agents with Verbal Reinforcement Learning2023
- ReFT: Representation Finetuning for Language Models2024
- Reinforced Attention Learning2026
- Reinforced Language Models for Sequential Decision Making2025
- Reinforcement Learning be Enough for Thinking?2025
- Reinforcement Learning Finetunes Small Subnetworks in Large Language Models2025
- Reinforcement Learning for Optimizing RAG for Domain Chatbots2024
- Reinforcement Learning for Reasoning in Large Language Models with One Training Example2025
- Reinforcement Learning via Self-Distillation2026
- Reinforcement Learning with Rubric Anchors2025
- Reinforcement Learning: An Overview2024
- Reinforcement Pre-Training2025
- Reinforcing General Reasoning without Verifiers2025
- Repeat After Me: Transformers are Better than State Space Models at Copying2024
- Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models2024
- RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns2025
- Representation biases: will we achieve complete understanding by analyzing representations?2025
- Representation Engineering: A Top-Down Approach to AI Transparency2023
- Reranking-based Generation for Unbiased Perspective Summarization2025
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning2025
- Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents2024
- Rethinking Conversational Agents in the Era of LLMs: Proactivity, Non-collaborativity, and Beyond
- Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning2025
- Rethinking Interpretability in the Era of Large Language Models2024
- Rethinking Large Language Models in Mental Health Applications2023
- Rethinking Memory as Continuously Evolving Connectivity2026
- Rethinking STS and NLI in Large Language Models
- Rethinking Thinking Tokens: LLMs as Improvement Operators2025
- Rethinking with Retrieval: Faithful Large Language Model Inference2022
- Retrieval Head Mechanistically Explains Long-Context Factuality2024
- Retrieval-augmented reasoning with lean language models2025
- RevCore: Review-augmented Conversational Recommendation2021
- Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up2024
- Reverse Thinking Makes LLMs Stronger Reasoners2024
- Review-LLM: Harnessing Large Language Models for Personalized Review Generation2024
- Revisiting LLM Reasoning via Information Bottleneck2025
- Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation2025
- Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration2025
- Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?2025
- Revolutionizing Mental Health Support: An Innovative Affective Mobile Framework for Dynamic, Proactive, and Context-Adaptive Conversational Agents2024
- Reward Reasoning Model2025
- Reward-Robust RLHF in LLMs2024
- RewardBench: Evaluating Reward Models for Language Modeling2024
- Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment2024
- ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models2023
- Rhetoric, Logic, and Dialectic: Advancing Theory-based Argument Quality Assessment in Natural Language Processing
- Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design2025
- RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation2024
- Rise of Machine Agency: A Framework for Studying the Psychology of Human–AI Interaction (HAII)
- RL + Transformer = A General-Purpose Problem Solver2025
- RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs2025
- RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization2025
- RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner2024
- RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems2025
- RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback2023
- RLHF Workflow: From Reward Modeling to Online RLHF2024
- RLNVR: Reinforcement Learning from Non-Verified Real-World Rewards2025
- RLP: Reinforcement as a Pretraining Objective2025
- RLPR: Extrapolating RLVR to General Domains without Verifiers2025
- RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents2025
- RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents2025
- RM-R1: Reward Modeling as Reasoning2025
- Role play with large language models
- Role-Play with Large Language Models2023
- RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models2023
- RouteLLM: Learning to Route LLMs with Preference Data2024
- rStar2-Agent: Agentic Reasoning Technical Report2025
- Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains2025
- Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs2025
S146↑ top
- S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models2025
- SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval2023
- Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models2024
- Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning2025
- SAND: Boosting LLM Agents with Self-Taught Action Deliberation2025
- Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search2025
- Scalable Language Models with Posterior Inference of Latent Thought Vectors2025
- Scalable Neural Contextual Bandit for Recommender Systems2023
- Scaling Behavior of Single LLM-Driven Multi-Agent Systems2026
- Scaling can lead to compositional generalization2025
- Scaling Expert Language Models with Unsupervised Domain Discovery2023
- Scaling Latent Reasoning via Looped Language Models2025
- Scaling Laws for Agent Harnesses via Effective Feedback Compute2026
- Scaling Laws for Neural Language Models2020
- Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs2025
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters2024
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet2026
- Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models2025
- Scaling Synthetic Data Creation with 1,000,000,000 Personas2024
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach2025
- Schema-learning and rebinding as mechanisms of in-context learning and emergence2023
- SciTopic: Enhancing Topic Discovery in Scientific Literature through Advanced LLM2025
- ScreenAI: A Vision-Language Model for UI and Infographics Understanding2024
- SDPO: Segment-Level Direct Preference Optimization for Social Agents2025
- SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs2025
- Search Arena: Analyzing Search-Augmented LLMs2025
- Search-o1: Agentic Search-Enhanced Large Reasoning Models2025
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning2025
- Searching for Best Practices in Retrieval-Augmented Generation2024
- See you soon again, chatbot? A design taxonomy to characterize user-chatbot relationships with different time horizons
- Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory2025
- Seemingly Conscious AI Risks
- Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
- Self Selection and Information Role of Online Product Reviews
- Self-Adapting Language Models2025
- Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems2023
- Self-Alignment with Instruction Backtranslation2023
- Self-consistency Improves Chain Of Thought Reasoning In Language Models2022
- Self-critiquing models for assisting human evaluators2022
- Self-Directed Synthetic Dialogues and Revisions Technical Report2024
- Self-Discover: Large Language Models Self-Compose Reasoning Structures2024
- Self-distillation Enables Continual Learning2026
- Self-Evaluation Guided Beam Search for Reasoning2023
- Self-Improving Model Steering2025
- Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges2025
- SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions2022
- Self-Organizing Graph Reasoning Evolves into a Critical State for Continuous Discovery Through Structural-Semantic Dynamics2025
- Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models2024
- Self-Questioning Language Models2025
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection2023
- Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst2025
- Self-Refine: Iterative Refinement with Self-Feedback2023
- Self-reflecting Large Language Models: A Hegelian Dialectical Approach2025
- Self-Reflection in LLM Agents: Effects on Problem-Solving Performance2024
- Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?2025
- Self-reinforcing cascades: A spreading model for beliefs or products of varying intensity or quality2024
- Self-Rewarding Language Models2024
- Self-Rewarding Vision-Language Model via Reasoning Decomposition2025
- Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels2024
- Self-Supervised Models of Speech Infer Universal Articulatory Kinematics2023
- Self-Taught Evaluators2024
- Semantic Change Characterization with LLMs using Rhetorics2024
- Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
- Semantic Specialization for Knowledge-based Word Sense Disambiguation2023
- Semantic Structure in Large Language Model Embeddings2025
- Sequence Organization in Interaction: A Primer in Conversation Analysis
- SERL: Self-Examining Reinforcement Learning on Open-Domain2025
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training2025
- Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO2025
- Should Humans Lie to Machines? The Incentive Compatibility of Lasso and General Weighted Lasso2021
- Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue2024
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent2024
- Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making2025
- Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds2023
- Simple Synthetic Data Reduces Sycophancy In Large Language Models2023
- SimPO: Simple Preference Optimization with a Reference-Free Reward2024
- Simulacra as conscious exotica2024
- Simulating Society Requires Simulating Thought2025
- Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets2026
- Single-agent or Multi-agent Systems? Why Not Both?2025
- Situating Recommender Systems in Practice: Towards Inductive Learning and Incremental Updates2022
- SkillClaw: Let Skills Evolve Collectively with Agentic Evolver2026
- SkillOpt: Executive Strategy for Self-Evolving Agent Skills2026
- SkillOS: Learning Skill Curation for Self-Evolving Agents2026
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning2026
- Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models2023
- Sleep-time Compute: Beyond Inference Scaling at Test-time2025
- Small Language Models are the Future of Agentic AI2025
- Small LLMs Are Weak Tool Learners: A Multi-LLM Agent2024
- SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding2023
- Social Responses to Media Technologies in the 21st Century: The Media are Social Actors Paradigm
- Social Robots for Long-Term Interaction: A Survey
- Social Skill Training with Large Language Models
- SocraSynth: Multi-LLM Reasoning with Conditional Statistics2024
- Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space2025
- Soft Tokens, Hard Truths2025
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs2025
- SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching
- Solving a Million-Step LLM Task with Zero Errors2025
- Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models2024
- SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
- Sources of Hallucination by Large Language Models on Inference Tasks
- SParC: Cross-Domain Semantic Parsing in Context
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models2025
- SPICE: Self-Play In Corpus Environments Improves Reasoning2025
- SpikingBrain: Spiking Brain-inspired Large Models2025
- Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations2026
- SpreadsheetLLM: Encoding Spreadsheets for Large Language Models2024
- Spurious Forgetting in Continual Learning of Language Models2025
- Spurious Rewards: Rethinking Training Signals in RLVR
- SSRL: Self-Search Reinforcement Learning2025
- Stance Detection on Social Media with Fine-Tuned Large Language Models2024
- STaR-GATE: Teaching Language Models to Ask Clarifying Questions2024
- Statistical and Algorithmic Foundations of Reinforcement Learning2025
- SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF2023
- StepWiser: Stepwise Generative Judges for Wiser Reasoning2025
- Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!2025
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models2025
- StoryScope: Investigating idiosyncrasies in AI fiction2026
- Strategic Reasoning with Language Models2023
- Stream of Search (SoS): Learning to Search in Language2024
- Stress Testing Deliberative Alignment for Anti-Scheming Training2025
- StructGPT: A General Framework for Large Language Model to Reason over Structured Data2023
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization2024
- Structured and Natural Responses Co-generation for Conversational Search
- Study: Large language models can’t effectively recognize users’ motivation, but can support behavior change for those ready to act
- Style Vectors for Steering Generative Large Language Models
- Subliminal Learning: Language models transmit behavioral traits via hidden signals in data2025
- Summaries, Highlights, and Action items: Design, implementation and evaluation of an LLM-powered meeting recap system2023
- Supervised Pretraining Can Learn In-Context Reinforcement Learning2023
- Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning2025
- SupervisorBot: NLP-Annotated Real-Time Recommendations of Psychotherapy Treatment Strategies with Deep Reinforcement Learning2022
- Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents2024
- Suppressing Pink Elephants with Direct Principle Feedback2024
- Survey on Evaluation of LLM-based Agents2025
- Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models2024
- Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories2025
- Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models2024
- Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence2025
- Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians2026
- SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs2025
- Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models2023
- Synthetic Dialogue Dataset Generation using LLM Agents2024
- System 1 vs. System 2 Thinking
- System 2 Attention (is something you might need too)2023
- Systematic synthesis of design prompts for large language models in conceptual design
T201↑ top
- Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager2025
- TaleStream: Supporting Story Ideation with Trope Knowledge2023
- Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs2024
- Talk like a Graph: Encoding Graphs for Large Language Models2023
- Talking About Large Language Models2022
- TarGEN: Targeted Data Generation with Large Language Models2023
- Target-Guided Open-Domain Conversation2019
- Task Contamination: Language Models May Not Be Few-Shot Anymore2023
- Task-Oriented Dialogue as Dataflow Synthesis2020
- Task-Oriented Dialogue with In-Context Learning2024
- TaskLAMA: Probing the Complex Task Understanding of Language Models2023
- TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation2024
- Teaching Large Language Models to Reason with Reinforcement Learning2024
- Teaching Probabilistic Logical Reasoning to Transformers2023
- Tell me about yourself: LLMs are aware of their learned behaviors2025
- Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future2025
- Test-time Prompt Intervention2025
- Test-Time Scaling with Reflective Generative Model2025
- Textgrad: Automatic “Differentiation” via Text2024
- The Abstraction Fallacy: Why AI Can Simulate But Not Instantiate Consciousness
- The AI Hippocampus: How Far are We From Human Memory?2026
- The Alien Space of Science: Sampling Coherent but Cognitively Unavailable Research Directions2026
- The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs2025
- The Architectural Implications of Facebook’s DNN-based Personalized Recommendation2019
- The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants2017
- The Art of Scaling Reinforcement Learning Compute for LLMs2025
- The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models2026
- The Challenges in Designing a Prevention Chatbot for Eating Disorders: Observational Study
- The Consensus Game: Language Model Generation via Equilibrium Search2023
- The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think2025
- The Curse Of Recursion: Training On Generated Data Makes Models Forget2023
- The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind2025
- The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning2025
- The Digital Therapeutic Alliance and Human-Computer Interaction
- The Digital Therapeutic Alliance: Prospects and Considerations
- The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation2023
- The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis
- The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?2025
- The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models2025
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits2024
- The Ethics of Advanced AI Assistants2024
- The Evolution of Multimodal Model Architectures2024
- The False Promise of Imitating Proprietary LLMs2023
- The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation2024
- The Future of AI: Exploring the Potential of Large Concept Models2025
- The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs2022
- The Hallucination Tax of Reinforcement Finetuning2025
- The Hermeneutics of Artificial Text
- The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas2025
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs2025
- The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs2025
- The Illusion of the Illusion of the Illusion of Thinking
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- The Impact of AI-Generated Text on the Internet
- The Impact of Artificial Intelligence on Human Thought2025
- The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
- The impact of generative artificial intelligence on socioeconomic inequalities and policy making
- The Impossibility of Fair LLMs2024
- The Incomplete Bridge: How AI Research (Mis)Engages with Psychology2025
- The Insanity of Relying on Vector Embeddings: Why RAG Fails
- The Invisible Leash: Why RLVR May Not Escape Its Origin2025
- The Labor Market Effects of Generative Artificial Intelligence
- The Landscape of Agentic Reinforcement Learning for LLMs: A Survey2025
- The Levers of Political Persuasion with Conversational AI2025
- The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows2026
- The Method of Critical AI Studies, A Propaedeutic2024
- The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics2025
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning2026
- The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning2026
- The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making2024
- The Netflix Recommender System: Algorithms, Business Value, and Innovation
- The Partner Modelling Questionnaire: A validated self-report measure of perceptions toward machines as dialogue partners2023
- The persuasive effects of political microtargeting in the age of generative artificial intelligence
- The Place of Emotion in Argument
- The Prompt Report: A Systematic Survey of Prompting Techniques2024
- The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?2024
- The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"2023
- The Serial Scaling Hypothesis2025
- The social component of the projection behavior of clausal complement contents
- The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs2025
- The state of enterprise AI
- The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning2025
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning2024
- The Thin Line Between Comprehension and Persuasion in LLMs2025
- The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities2024
- The Unreasonable Ineffectiveness of the Deeper Layers2024
- The Vanishing Gradient Problem for Stiff Neural Differential Equations2025
- The Vector Grounding Problem2023
- The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?2025
- TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks2024
- Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models2025
- Theory of Knowledge Based on the Idea of the Discursive Space
- Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?2024
- There Will Be a Scientific Theory of Deep Learning2026
- Think before you speak: Training Language Models With Pause Tokens2023
- Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens2026
- Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods2025
- Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models2025
- Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate2025
- Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection2024
- Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking2025
- Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory2023
- Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph2023
- Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor2026
- Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering2023
- Thinking Augmented Pre-training2025
- Thinking Forward and Backward: Effective Backward Planning with Large Language Models2024
- Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning2025
- Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs2025
- Thinking LLMs: General Instruction Following with Thought Generation2024
- Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction2025
- Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender
- Thinkless: LLM Learns When to Think2025
- Thought Anchors: Which LLM Reasoning Steps Matter?2025
- Thought Communication in Multiagent Collaboration2025
- Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems2026
- Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs2025
- Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines2025
- Thousands of AI Authors on the Future of AI2024
- Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation2024
- Through the Lens of Human-Human Collaboration: A Configurable Research Platform for Exploring Human-Agent Collaboration2025
- TiMoE: Time-Aware Mixture of Language Experts2025
- Tina: Tiny Reasoning Models via LoRA2025
- Titans: Learning to Memorize at Test Time2024
- TnT-LLM: Text Mining at Scale with Large Language Models2024
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning2024
- To Tell The Truth: Language of Deception and Language Models2023
- TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters2024
- Too Good to be Bad: On the Failure of LLMs to Role-Play Villains2025
- ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis2024
- Topic Modeling in Embedding Spaces2019
- Topic Shift Detection for Mixed Initiative Response
- Topic-Guided Conversational Recommender in Multiple Domains2020
- Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties2025
- Toward Conversational Agents with Context and Time Sensitive Long-term Memory2024
- Toward Efficient Agents: A Survey of Memory, Tool Learning, and Planning2026
- Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design2025
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing2024
- Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks2022
- Toward understanding and preventing misalignment generalization
- Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models2025
- Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models2023
- Towards a Science of Scaling Agent Systems2025
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs2025
- Towards Algorithmic Experience
- Towards Collective Superintelligence, a Pilot Study2023
- Towards Conversational Recommendation over Multi-Type Dialogs2020
- Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset
- Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?2020
- Towards Healthy AI: Large Language Models Need Therapists Too2023
- Towards Human-centered Proactive Conversational Agents2024
- Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities2025
- Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning2025
- Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
- Towards Optimal Learning of Language Models2024
- Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control2024
- Towards Question-based Recommender Systems2020
- Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models2025
- Towards Safe and Honest AI Agents with Neural Self-Other Overlap2024
- Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought2025
- Towards Understanding Counseling Conversations: Domain Knowledge and Large Language Models2024
- Train Long, Think Short: Curriculum Learning for Efficient Reasoning2025
- Training a Generally Curious Agent2025
- Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression2025
- Training for Compositional Sensitivity Reduces Dense Retrieval Generalization2026
- Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning2025
- Training language models to be warm and empathetic makes them less reliable and more sycophantic2025
- Training language models to follow instructions with human feedback2022
- Training Language Models to Self-Correct via Reinforcement Learning2024
- Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning2024
- Training Large Language Models to Reason in a Continuous Latent Space2024
- Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning2025
- Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis2024
- Training-Free Group Relative Policy Optimization2025
- Transcendence: Generative Models Can Outperform The Experts That Train Them2024
- Transformer-based cynical expression detection in a corpus of Spanish YouTube reviews
- Transformer2: Self-adaptive LLMs2025
- TransformerFAM: Feedback attention is working memory2024
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality2024
- TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants2024
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models2023
- Tree Search for Language Model Agents2024
- Tree Search for LLM Agent Reinforcement Learning2025
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search2025
- Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models2025
- Truly Self-Improving Agents Require Intrinsic Metacognitive Learning2025
- Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods2022
- TrustLLM: Trustworthiness in Large Language Models2024
- Truth or lie: Exploring the language of deception
- TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning2025
- TTRL: Test-Time Reinforcement Learning2025
- Tube2Vec: Social and Semantic Embeddings of YouTube Channels2023
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training2024
- Tuning Language Models by Proxy
- Turiya at DialAM-2024: Inference Anchoring Theory Based LLM Parsers
- Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents2024
- Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion2024
- Turning large language models into cognitive models
- Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization2024
- TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models2024
- Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering2025
U39↑ top
- UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity2024
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models2024
- Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy2024
- Understanding and Mitigating Premature Confidence for Better LLM Reasoning2026
- Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting2025
- Understanding Hidden Computations in Chain-of-Thought Reasoning2024
- Understanding LLMs: A Comprehensive Overview from Training to Inference2024
- Understanding the Role of User Profile in the Personalization of Large Language Models2024
- Understanding the Therapeutic Relationship between Counselors and Clients in Online Text-based Counseling using LLMs2024
- Understanding Tool-Integrated Reasoning2025
- Understanding, explaining, and utilizing medical artificial intelligence
- Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning2021
- Unifying Large Language Models and Knowledge Graphs: A Roadmap2023
- Unifying Nearest Neighbors Collaborative Filtering
- UniGraph: Learning a Unified Cross-Domain Foundation Model for Text-Attributed Graphs2024
- Unintended Impacts of LLM Alignment on Global Representation2024
- Universe of Thoughts: Enabling Creative Reasoning with Large Language Models2025
- Unleashing Cognitive Synergy In Large Language Models: A Task-solving Agent Through Multi-persona Self-collaboration2023
- Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem2025
- Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
- Unsupervised Elicitation of Language Models
- Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study2025
- UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation2023
- UR2: Unify RAG and Reasoning through Reinforcement Learning2025
- Useful Memories Become Faulty When Continuously Updated by LLMs2026
- User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal2025
- User-Centric Conversational Recommendation with Multi-Aspect User Modeling2022
- User-LLM: Efficient LLM Contextualization with User Embeddings2024
- UserBench: An Interactive Gym Environment for User-Centric Agents2025
- Using Computational Models to Test Syntactic Learnability
- Using Large Language Models to Create AI Personas for Replication and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings2024
- Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies2023
- Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies2022
- Using Linguistic Synchrony to Evaluate Large Language Models for Cognitive Behavioral Therapy
- Using LLMs to Discover Legal Factors2024
- Using Natural Language for Reward Shaping in Reinforcement Learning2019
- Using Navigation to Improve Recommendations in Real-Time
- Using Topic Models to Identify Clients’ Functioning Levels and Alliance Ruptures in Psychotherapy
- Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs2025
V13↑ top
- Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties2023
- Variational Autoencoders for Collaborative Filtering2018
- VCBench: Benchmarking LLMs in Venture Capital2025
- VCounselor: A Psychological Intervention Chat Agent Based on a Knowledge-Enhanced Large Language Model2024
- Vector Policy Optimization: Training for Diversity Improves Test-Time Search2026
- Verbal lie detection using Large Language Models
- Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity2025
- VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild2026
- Virtual Assistance in Any Context
- Virtuous Machines: Towards Artificial General Science2025
- VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction2025
- Voxtral2025
- Voyager: An Open-Ended Embodied Agent with Large Language Models2023
W39↑ top
- We Are All Creators: Generative AI, Collective Knowledge, and the Path Towards Human-AI Synergy2025
- We Wont be Missed: Work and Growth in the Era of AGI
- Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation
- Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics2025
- Weight-sparse transformers have interpretable circuits2025
- We’re Afraid Language Models Aren’t Modeling Ambiguity2023
- What are the Goals of Distributional Semantics?2020
- What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT2025
- What does it mean to understand language?2025
- What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity2025
- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models2025
- What is a Discourse Graph?
- What Makes a Good Natural Language Prompt?2025
- What the F*ck Is Artificial General Intelligence?2025
- What we talk to when we talk to language models
- When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs2025
- When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models2024
- When Large Language Models are More Persuasive Than Incentivized Humans, and Why2025
- When Large Language Models contradict humans? Large Language Models’ Sycophantic Behaviour2023
- When More is Less: Understanding Chain-of-Thought Length in LLMs2025
- When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions2025
- When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection2025
- When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method2024
- When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs2025
- WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue2025
- Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning2025
- Who’s Afraid of (Left) Hyperstitions
- Why Do Multi-agent LLM Systems Fail?
- Why Do People Rate? Theory and Evidence on Online Ratings
- Why Do Some Language Models Fake Alignment While Others Don't?2025
- Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?2026
- Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention2026
- Wide & Deep Learning for Recommender Systems2016
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness2020
- Word Meanings in Transformer Language Models2025
- Working Alliance Transformer for Psychotherapy Dialogue Classification2022
- Working with AI: Measuring the Occupational Implications of Generative AI2025
- Workplace Everyday-Creativity through a Highly-Conversational UI to Large Language Models
- Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards2025
Y2↑ top
Z3↑ top
#12↑ top
- "Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline2024
- "It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems2021
- "My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community2025
- (QA)2: Question Answering with Questionable Assumptions2022
- 100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models2025
- 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities2025
- “Hello There! Is Now a Good Time to Talk?”: Opportune Moments for Proactive Interactions with Smart Speakers
- “It Felt Like Having a Second Mind”: Investigating Human-AI Co-creativity in Prewriting with Large Language Models2023
- “Mama Always Had a Way of Explaining Things So I Could Understand”: A Dialogue Corpus for Learning to Construct Explanations2022
- “Understanding AI”: Semantic Grounding in Large Language Models2024
- “What do others think?”: Task-Oriented Conversational Modeling with Subjective Knowledge
- 𝙻𝙼𝟸: A Simple Society of Language Models Solves Complex Reasoning2024
Psychology, Society, and Alignment484↑ top
- "Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline2024
- "My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community2025
- A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy2025
- A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges2025
- A comprehensive taxonomy of hallucinations in Large Language Models2025
- A Computational Framework for Behavioral Assessment of LLM Therapists2024
- A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models2025
- A Framework for Collaborating a Large Language Model Tool in Brainstorming for Triggering Creative Thoughts2024
- A natural language processing approach reveals first-person pronoun usage and non-fluency as markers of therapeutic alliance in psychotherapy
- A recipe for annotating grounded clarifications2021
- A sociotechnical perspective for the future of AI: narratives, inequalities, and human control
- A Survey of Meta-Reinforcement Learning2023
- A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks2025
- A Taxonomy of Empathetic Questions in Social Dialogs
- AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions2025
- Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics2025
- Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach2024
- Agent Laboratory: Using LLM Agents as Research Assistants2025
- Agent S: An Open Agentic Framework that Uses Computers Like a Human2024
- Agentic AI and the next intelligence explosion2026
- Agentic Misalignment: How LLMs Could Be Insider Threats2025
- AgentRxiv: Towards Collaborative Autonomous Research2025
- Agents Are Not Enough2024
- Agreement Tracking for Multi-Issue Negotiation Dialogues2023
- AI & Human Co-Improvement for Safer Co-Superintelligence2025
- AI Assistance Reduces Persistence and Hurts Independent Performance2026
- AI Companions Reduce Loneliness2024
- AI Enters Public Discourse: A Habermasian Assessment Of The Moral Status Of Large Language Models
- AI Meets the Classroom: When Does ChatGPT Harm Learning?2024
- AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms2025
- AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting
- AI-Powered (Finance) Scholarship
- AI-Researcher: Autonomous Scientific Innovation2025
- AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data2025
- ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making2025
- All AI Models are Wrong, but Some are Optimal2025
- An Emulator for Fine-Tuning Large Language Models using Small Language Models2023
- An extended framework for characterizing social robots2019
- Are Customers Lying to Your Chatbot?
- Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks2023
- ARGS: Alignment as Reward-Guided Search2024
- Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)2025
- Assessment of Personality Dimensions Across Situations Using Conversational Speech2025
- Auditing language models for hidden objectives2025
- AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling2025
- Automated Alignment Researchers: Using large language models to scale scalable oversight2022
- Automated Social Science: Language Models as Scientist and Subjects2024
- Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies2023
- AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration2026
- AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders2025
- Backtracing: Retrieving the Cause of the Query2024
- Benchmarking the Pedagogical Knowledge of Large Language Models2025
- Better Alignment with Instruction Back-and-Forth Translation2024
- Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback2025
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey2024
- Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models2025
- Beyond Answers: How LLMs Can Pursue Strategic Thinking in Education2025
- Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration2025
- Beyond Discrete Personas: Personality Modeling Through Journal Intensive Conversations2024
- Beyond Hallucinations: The Illusion of Understanding in Large Language Models2025
- Beyond Preferences in AI Alignment2024
- Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts2025
- Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate2025
- Beyond the Surface: Probing the Ideological Depth of Large Language Models2025
- Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.2023
- Building a Stronger CASA: Extending the Computers Are Social Actors Paradigm
- Building Decision Making Models Through Language Model Regime2024
- Building Machines that Learn and Think with People2024
- Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning2023
- CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society2023
- Can AI Explanations Make You Change Your Mind?2025
- Can AI Have a Personality? Prompt Engineering for AI Personality Simulation: A Chatbot Case Study in Gender-Affirming Voice Therapy Training2025
- Can Language Models Represent the Past without Anachronism?2025
- Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education2024
- Can Large Language Models Transform Computational Social Science?
- Can Large Language Models Understand Argument Schemes?
- Can LLM be a Personalized Judge?2024
- Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games2024
- Can robots do therapy?: Examining the efficacy of a CBT bot in comparison with other behavioral intervention technologies in alleviating mental health symptoms
- Canvil: Designerly Adaptation for LLM-Powered User Experiences2024
- Capturing Individual Human Preferences with Reward Features2025
- Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models2025
- Chain of Stance: Stance Detection with Large Language Models2024
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
- Challenges of Large Language Models for Mental Health Counseling2023
- Chamain: Harmonizing Character Persona Integrity with Domain-Adaptive Knowledge in Dialogue Generation
- Character is Destiny: Can Role-Playing Language Agents Make Persona-Driven Decisions?2024
- Chatbot vs. Human: The Impact of Responsive Conversational Features on Users’ Responses to Chat Advisors
- ChatGPT Doesn’t Trust Chargers Fans: Guardrail Sensitivity in Context2024
- ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs2025
- ChatGPT: deconstructing the debate and moving it forward
- ChatGPT: towards AI subjectivity
- Checklists Are Better Than Reward Models For Aligning Language Models2025
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
- CloChat: Understanding How People Customize, Interact, and Experience Personas in Large Language Models2024
- CogBench: a large language model walks into a psychology lab2024
- Cognitive Architectures for Language Agents2023
- Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations2025
- Cognitive Effects in Large Language Models2023
- Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog2025
- Comparing emotion feature extraction approaches for predicting depression and anxiety
- Comparing Human and AI Therapists in Behavioral Activation for Depression: Cross-Sectional Questionnaire Study
- COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling2024
- Computational Modelling of Undercuts in Real-world Arguments
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
- Computer says “No”: The Case Against Empathetic Conversational AI2022
- Conceptual Design Generation Using Large Language Models2023
- CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants2023
- Considering the Context to Build Theory in HCI, HRI, and HMC: Explicating Differences in Processes of Communication and Socialization With Social Technologies
- Consistency Training Helps Stop Sycophancy and Jailbreaks2025
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning2025
- Controlling Linguistic Style Aspects in Neural Language Generation2017
- Conversational Alignment with Artificial Intelligence in Context2025
- Conversational DNA: A New Visual Language for Understanding Dialogue Structure in Human and AI2025
- Conversational Prompt Engineering2024
- Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making2025
- Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying2024
- Cultural Evolution of Cooperation among LLM Agents2024
- DAPIE: Interactive Step-by-Step Explanatory Dialogues to Answer Children’s Why and How Questions
- DATATALES: Investigating the use of Large Language Models for Authoring Data-Driven Articles2023
- DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025
- Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality2025
- DeLLMa: Decision Making Under Uncertainty with Large Language Models2024
- Design Principles for Generative AI Applications2024
- Designing AI Personalities: Enhancing Human-Agent Interaction Through Thoughtful Persona Design2024
- Detecting Cognitive Distortions from Patient-Therapist Interactions
- Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change
- Determinants of LLM-assisted Decision-Making2024
- Developing Effective Educational Chatbots with ChatGPT prompts: Insights from Preliminary Tests in a Case Study on Social Media Literacy2023
- Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions
- Dialoging Resonance: How Users Perceive, Reciprocate and React to Chatbot’s Self-Disclosure in Conversational Recommendations2021
- Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources2023
- Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning2023
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model2023
- Disambiguating Anthropomorphism and Anthropomimesis in Human-Robot Interaction2026
- Discourse-Level Representations can Improve Prediction of Degree of Anxiety
- DiscussLLM: Teaching Large Language Models When to Speak2025
- Dissociating language and thought in large language models2023
- Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models2024
- Do large language models resemble humans in language use?2023
- Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom2024
- Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses2024
- Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models2023
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust2025
- Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?2025
- DO THEY SEE WHAT WE SEE?
- Do We Trust ChatGPT as much as Google Search and Wikipedia?
- DOC: Improving Long Story Coherence With Detailed Outline Control2022
- Does It Make Sense to Speak of Introspection in Large Language Models?2025
- DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration2025
- Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization2023
- Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models2024
- Eliciting Reasoning in Language Models with Cognitive Tools2025
- Emergent Introspective Awareness in Large Language Models
- EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus2023
- Empathetic Persuasion: Reinforcing Empathy and Persuasiveness in Dialogue Systems
- Empathy Through Multimodality in Conversational Interfaces2024
- Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting2023
- Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's Advocate
- Enhancing Pipeline-Based Conversational Agents with Large Language Model2023
- Enhancing social cohesion with cooperative bots in societies of greedy, mobile individuals2024
- Enhancing user experience in large language models through human-centered design: Integrating theoretical insights with an experimental study to meet diverse software learning needs with a single document knowledge base2024
- Estimating AI productivity gains from Claude conversations
- Evaluating Emotional Nuances In Dialogue Summarization2023
- Evaluating Large Language Models in Theory of Mind Tasks2023
- Evaluating the Diversity and Quality of LLM Generated Content2025
- Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling2023
- Evaluating the False Trust Engendered by LLM Explanations2026
- Evaluating the psychometric properties of ChatGPT-generated questions
- Evaluating the Therapeutic Alliance With a Free-Text CBT Conversational Agent (Wysa): A Mixed-Methods Study
- Evaluating Theory of Mind and Internal Beliefs in LLM-Based Multi-Agent Systems2026
- Evidence of Human-Level Bonds Established With a Digital Conversational Agent: Cross-sectional, Retrospective Observational Study
- Evidence-centered Assessment for Writing with Generative AI2024
- Existential Conversations with Large Language Models: Content, Community, and Culture2024
- Expanding Explainability: Towards Social Transparency in AI systems
- Expedient Assistance and Consequential Misunderstanding: Envisioning an Operationalized Mutual Theory of Mind
- Explainable Compliance Detection with Multi-Hop Natural Language Inference on Assurance Case Structure2025
- Explainable Multimodal Emotion Reasoning2023
- Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review2024
- Exploring the Role of Prior Beliefs for Argument Persuasion2019
- Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers2025
- Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering2026
- Find the Gap: AI, Responsible Agency and Vulnerability
- FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets2023
- Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models2025
- Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities2024
- Flows: Building Blocks of Reasoning and Collaborating AI2023
- Forecasting the presence and intensity of hostility on Instagram using linguistic and social features2018
- Foundation Protocol: A Coordination Layer for Agentic Society2026
- Foundations of Large Language Models2025
- From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence2026
- From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers2025
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation2025
- From Prompt Engineering to Prompt Science With Human in the Loop2024
- From Simulation to Enaction: Post-trained Language Models Recognize and React to their own Generations2026
- From speaking like a person to being personal: The effects of personalized, regular interactions with conversational agents
- From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs2024
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning2025
- Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report2025
- Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce2025
- GenAI as a Power Persuader: How Professionals Get Persuasion Bombed When They Attempt to Validate LLMs
- Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy2025
- Generative Agent Simulations of 1,000 People2024
- Generative Interfaces for Language Models2025
- GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency2024
- Goal Alignment in LLM-Based User Simulators for Conversational AI2025
- Goals, Plans, and Action Models
- GPT-4 as a Homework Tutor can Improve Student Engagement and Learning Outcomes2024
- GPT-4 is judged more human than humans in displaced and inverted Turing tests2024
- Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development2025
- H2HTalk: Evaluating Large Language Models as Emotional Companion2025
- Hallucinating with AI: AI Psychosis as Distributed Delusions2025
- Hallucinations Undermine Trust; Metacognition is a Way Forward2026
- How AI Impacts Skill Formation2026
- How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs2024
- How new data permeates LLM knowledge and how to dilute it2025
- How well can large language models explain business processes?2024
- Humans learn to prefer trustworthy AI over human partners2025
- Humans or LLMs as the Judge? A Study on Judgement Biases2024
- Humans overrely on overconfident language models, across languages2025
- Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models2025
- IMBUE: Improving Interpersonal Effectiveness through Simulation and Just-in-time Feedback with Human-Language Model Interaction
- Improving Dialog Systems for Negotiation with Personality Modeling2020
- Inducing Positive Perspectives with Text Reframing2022
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model2023
- InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles2025
- Inspecting and Editing Knowledge Representations in Language Models2023
- Interaction Dynamics as a Reward Signal for LLMs2025
- Interactions with generative AI chatbots: unveiling dialogic dynamics, students’ perceptions, and practical competencies in creative problem-solving
- Interactive Evaluation Requires a Design Science2026
- Interesting Scientific Idea Generation Using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders
- Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments2023
- Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs2024
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena2023
- Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations2023
- KTO: Model Alignment as Prospect Theoretic Optimization2024
- Language Models are Pragmatic Speakers2023
- Language Models’ Hall of Mirrors Problem: Why AI Alignment Requires Peircean Semiosis
- Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments2024
- Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?*2023
- Large Language Models Can Infer Psychological Dispositions of Social Media Users2023
- Large language models can segment narrative events similarly to humans2023
- Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
- Large Language Models Do Not Simulate Human Psychology2025
- Large Language Models for User Interest Journeys2023
- Large Language Models Reflect the Ideology of their Creators2024
- Large Language Models Report Subjective Experience Under Self-Referential Processing2025
- Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency2024
- Learning "Partner-Aware" Collaborators in Multi-Party Collaboration2025
- Learning Human-Object Interaction as Groups2025
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries2025
- Levels of Analysis for Large Language Models2025
- LIMA: Less Is More for Alignment2023
- LLM Generated Persona is a Promise with a Catch2025
- LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory2025
- LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices2024
- LLMorphism: When humans come to see themselves as language models
- LLMs as Method Actors: A Model for Prompt Engineering and Architecture2024
- LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring2025
- LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings2025
- LSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following2025
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models2025
- Machine ex machina: A Framework Decentering the Human in AI Design Praxis
- Machine gaze in online behavioral targeting: The effects of algorithmic human likeness on social presence and social influence
- Machine Psychology2023
- Magentic-UI: Towards Human-in-the-loop Agentic Systems2025
- Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing2024
- Man vs machine – Detecting deception in online reviews
- Mathematical methods and human thought in the age of AI2026
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs2024
- Meanings are like Onions: a Layered Approach to Metaphor Processing2025
- Measuring Alliance and Symptom Severity in Psychotherapy Transcripts Using Bert Topic Modeling
- Measuring and Mitigating Persona Distortions from AI Writing Assistance2026
- Measuring Human Preferences in RLHF is a Social Science Problem2026
- Mechanistic Indicators of Understanding in Large Language Models2025
- Metadiscursive nouns in academic argument: ChatGPT vs student practices
- MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems2025
- Mindstorms in Natural Language-Based Societies of Mind2023
- Misaligned by Design: Incentive Failures in Machine Learning2025
- Modeling Interpersonal Linguistic Coordination in Conversations using Word Mover's Distance2019
- Modeling the Quality of Dialogical Explanations2024
- MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind2025
- MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis2025
- Multi-agent cooperation through in-context co-player inference2026
- Natural Emergent Misalignment From Reward Hacking In Production Rl2025
- Natural Emergent Misalignment From Reward Hacking In Production RL
- Neural Topic Modeling of Psychotherapy Sessions2022
- News Source Citing Patterns in AI Search Systems2025
- Next Steps for Human-Centered Generative AI: A Technical Perspective2023
- NoveltyBench: Evaluating Language Models for Humanlike Diversity2025
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking2025
- On the Adaptive Psychological Persuasion of Large Language Models2025
- On the Binding Problem in Artificial Neural Networks2020
- On The Persona-based Summarization of Domain-Specific Documents2024
- On the Societal Impact of Open Foundation Models2024
- Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models2024
- OpenAssistant Conversations - Democratizing Large Language Model Alignment2023
- Operating Multi-Client Influence Networks Across Platforms
- Opportunities for large language models and discourse in engineering design
- Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution2025
- PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing2026
- PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals2024
- Payrolls to Prompts: Firm-Level Evidence on the Substitution of Labor for AI2026
- People cannot distinguish GPT-4 from a human in a Turing test2024
- Persistent Pre-Training Poisoning of LLMs2024
- PersLLM: A Personified Training Approach for Large Language Models2024
- Persona Generators: Generating Diverse Synthetic Personas at Scale2026
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models2025
- Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning2025
- PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time2025
- PersonaGym: Evaluating Persona Agents and LLMs2024
- Personalization of Large Language Models: A Survey2024
- PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer2023
- Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online2024
- PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues2025
- PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers2024
- Polanyi’s Revenge and AI’s New Romance with Tacit Knowledge
- Position: Towards Bidirectional Human-AI Alignment2024
- Post-training makes large language models less human-like2026
- PosterMate: Audience-driven Collaborative Persona Agents for Poster Design2025
- Potemkin Understanding in Large Language Models2025
- Predictive Preference Learning from Human Interventions2025
- Pretrained Language Models as Containers of the Discursive Knowledge
- Proactive behavior in voice assistants: A systematic review and conceptual model
- Proactive Conversational Agents with Inner Thoughts2024
- Prompting Science Report 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy2025
- Propositional Interpretability in Artificial Intelligence2025
- ProsocialDialog: A Prosocial Backbone for Conversational Agents2022
- ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs2025
- Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience2024
- PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health2024
- Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning2025
- Psychological, Relational, and Emotional Effects of Self-Disclosure After Conversations With a Chatbot2024
- Psychological, Relational, and Emotional Effects of Self-Disclosure After Conversations With a Chatbot
- Psychologically Enhanced AI Agents2025
- Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics2023
- PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling2024
- Quantifying Human-AI Synergy
- Re3: Generating Longer Stories With Recursive Reprompting and Revision2022
- Reasoning Models Are More Easily Gaslighted Than You Think2025
- Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations2024
- Reflections and New Directions for Human-Centered Large Language Models2026
- Representation Engineering: A Top-Down Approach to AI Transparency2023
- Rethinking Large Language Models in Mental Health Applications2023
- Revolutionizing Mental Health Support: An Innovative Affective Mobile Framework for Dynamic, Proactive, and Context-Adaptive Conversational Agents2024
- Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design2025
- Rise of Machine Agency: A Framework for Studying the Psychology of Human–AI Interaction (HAII)
- RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents2025
- Role play with large language models
- Role-Play with Large Language Models2023
- RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models2023
- Scaling Synthetic Data Creation with 1,000,000,000 Personas2024
- See you soon again, chatbot? A design taxonomy to characterize user-chatbot relationships with different time horizons
- Seemingly Conscious AI Risks
- Self-Alignment with Instruction Backtranslation2023
- Self-reflecting Large Language Models: A Hegelian Dialectical Approach2025
- Self-Rewarding Language Models2024
- Should Humans Lie to Machines? The Incentive Compatibility of Lasso and General Weighted Lasso2021
- Simple Synthetic Data Reduces Sycophancy In Large Language Models2023
- Simulacra as conscious exotica2024
- Simulating Society Requires Simulating Thought2025
- Social Responses to Media Technologies in the 21st Century: The Media are Social Actors Paradigm
- Social Robots for Long-Term Interaction: A Survey
- Social Skill Training with Large Language Models
- SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
- SPICE: Self-Play In Corpus Environments Improves Reasoning2025
- Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations2026
- Spurious Forgetting in Continual Learning of Language Models2025
- StoryScope: Investigating idiosyncrasies in AI fiction2026
- Stress Testing Deliberative Alignment for Anti-Scheming Training2025
- Study: Large language models can’t effectively recognize users’ motivation, but can support behavior change for those ready to act
- SupervisorBot: NLP-Annotated Real-Time Recommendations of Psychotherapy Treatment Strategies with Deep Reinforcement Learning2022
- Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents2024
- Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models2024
- Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories2025
- Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models2024
- Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence2025
- Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians2026
- System 2 Attention (is something you might need too)2023
- Systematic synthesis of design prompts for large language models in conceptual design
- Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager2025
- TaleStream: Supporting Story Ideation with Trope Knowledge2023
- Talking About Large Language Models2022
- Tell me about yourself: LLMs are aware of their learned behaviors2025
- The Abstraction Fallacy: Why AI Can Simulate But Not Instantiate Consciousness
- The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models2026
- The Challenges in Designing a Prevention Chatbot for Eating Disorders: Observational Study
- The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind2025
- The Digital Therapeutic Alliance and Human-Computer Interaction
- The Digital Therapeutic Alliance: Prospects and Considerations
- The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation2023
- The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis
- The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?2025
- The Ethics of Advanced AI Assistants2024
- The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs2022
- The Hermeneutics of Artificial Text
- The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas2025
- The Impact of AI-Generated Text on the Internet
- The Impact of Artificial Intelligence on Human Thought2025
- The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
- The impact of generative artificial intelligence on socioeconomic inequalities and policy making
- The Incomplete Bridge: How AI Research (Mis)Engages with Psychology2025
- The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows2026
- The Method of Critical AI Studies, A Propaedeutic2024
- The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making2024
- The Partner Modelling Questionnaire: A validated self-report measure of perceptions toward machines as dialogue partners2023
- The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?2024
- The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?2025
- Theory of Knowledge Based on the Idea of the Discursive Space
- Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?2024
- Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models2025
- Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate2025
- Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection2024
- Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering2023
- Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning2025
- Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender
- Thousands of AI Authors on the Future of AI2024
- Through the Lens of Human-Human Collaboration: A Configurable Research Platform for Exploring Human-Agent Collaboration2025
- To Tell The Truth: Language of Deception and Language Models2023
- Too Good to be Bad: On the Failure of LLMs to Role-Play Villains2025
- Topic Modeling in Embedding Spaces2019
- Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design2025
- Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models2023
- Towards Algorithmic Experience
- Towards Collective Superintelligence, a Pilot Study2023
- Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset
- Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?2020
- Towards Healthy AI: Large Language Models Need Therapists Too2023
- Towards Human-centered Proactive Conversational Agents2024
- Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning2025
- Towards Safe and Honest AI Agents with Neural Self-Other Overlap2024
- Towards Understanding Counseling Conversations: Domain Knowledge and Large Language Models2024
- Training a Generally Curious Agent2025
- Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning2025
- Training language models to be warm and empathetic makes them less reliable and more sycophantic2025
- Transformer-based cynical expression detection in a corpus of Spanish YouTube reviews
- Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods2022
- TrustLLM: Trustworthiness in Large Language Models2024
- Truth or lie: Exploring the language of deception
- TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning2025
- Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion2024
- Turning large language models into cognitive models
- Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization2024
- Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy2024
- Understanding the Therapeutic Relationship between Counselors and Clients in Online Text-based Counseling using LLMs2024
- Understanding, explaining, and utilizing medical artificial intelligence
- Unleashing Cognitive Synergy In Large Language Models: A Task-solving Agent Through Multi-persona Self-collaboration2023
- Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
- Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study2025
- User-LLM: Efficient LLM Contextualization with User Embeddings2024
- UserBench: An Interactive Gym Environment for User-Centric Agents2025
- Using Large Language Models to Create AI Personas for Replication and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings2024
- Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies2023
- Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies2022
- Using Linguistic Synchrony to Evaluate Large Language Models for Cognitive Behavioral Therapy
- Using Topic Models to Identify Clients’ Functioning Levels and Alliance Ruptures in Psychotherapy
- Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs2025
- Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties2023
- VCounselor: A Psychological Intervention Chat Agent Based on a Knowledge-Enhanced Large Language Model2024
- Verbal lie detection using Large Language Models
- Virtual Assistance in Any Context
- Virtuous Machines: Towards Artificial General Science2025
- We Are All Creators: Generative AI, Collective Knowledge, and the Path Towards Human-AI Synergy2025
- We Wont be Missed: Work and Growth in the Era of AGI
- What are the Goals of Distributional Semantics?2020
- What does it mean to understand language?2025
- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models2025
- What Makes a Good Natural Language Prompt?2025
- What the F*ck Is Artificial General Intelligence?2025
- What we talk to when we talk to language models
- When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs2025
- When Large Language Models contradict humans? Large Language Models’ Sycophantic Behaviour2023
- When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection2025
- WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue2025
- Who’s Afraid of (Left) Hyperstitions
- Why Do Some Language Models Fake Alignment While Others Don't?2025
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness2020
- Word Meanings in Transformer Language Models2025
- Working Alliance Transformer for Psychotherapy Dialogue Classification2022
- Working with AI: Measuring the Occupational Implications of Generative AI2025
- Workplace Everyday-Creativity through a Highly-Conversational UI to Large Language Models
- Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task2025
- “Hello There! Is Now a Good Time to Talk?”: Opportune Moments for Proactive Interactions with Smart Speakers
- “It Felt Like Having a Second Mind”: Investigating Human-AI Co-creativity in Prewriting with Large Language Models2023
Reasoning, Retrieval, and Evaluation735↑ top
- 100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models2025
- A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap2025
- A Comparative Study on Reasoning Patterns of OpenAI's o1 Model2024
- A comprehensive analysis of concept drift locality in data streams2023
- A Comprehensive Evaluation of Inductive Reasoning Capabilities and Problem Solving in Large Language Models
- A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications2025
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models2024
- A comprehensive taxonomy of hallucinations in Large Language Models2025
- A Decomposition Perspective to Long-context Reasoning for LLMs2026
- A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models2025
- A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning2024
- A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions2024
- A Mechanistic Analysis of Looped Reasoning Language Models2026
- A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity2023
- A Non-Factoid Question-Answering Taxonomy
- A Personalized Recommender System based-on Knowledge Graph Embeddings2023
- A Survey of Calibration Process for Black-Box LLMs2024
- A Survey on Concept Drift Adaptation
- A Survey on Diffusion Language Models2025
- A Survey on Large Language Models for Recommendation2023
- A Survey on Large Language Models with some Insights on their Capabilities and Limitations2025
- A Survey on Prompt Tuning2025
- A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks2025
- A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o12025
- Abductive Reasoning with the GPT-4 Language Model: Case studies from criminal investigation, medical practice, scientific research2023
- Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
- AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions2025
- Activation Steering for Chain-of-Thought Compression2025
- Active Listening: Personalized Question Generation in Open-Domain Social Conversation with User Model Based Prompting
- Active Retrieval Augmented Generation2023
- Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home2025
- Advancing LLM Reasoning Generalists with Preference Trees2024
- Affordable AI Assistants with Knowledge Graph of Thoughts2025
- Agent Laboratory: Using LLM Agents as Research Assistants2025
- Agentic Code Reasoning2026
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models2025
- Agentic Reasoning for Large Language Models2026
- Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research2025
- AgentRxiv: Towards Collaborative Autonomous Research2025
- AI for Auto-Research: Roadmap & User Guide2026
- AI-Powered (Finance) Scholarship
- AI-Researcher: Autonomous Scientific Innovation2025
- AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data2025
- Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models2023
- Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models2024
- ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making2025
- Aligning Language Models to Explicitly Handle Ambiguity2024
- An Automatic Graph Construction Framework based on Large Language Models for Recommendation2024
- An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems2025
- An Overview Of Temporal Commonsense Reasoning and Acquisition2023
- ANAPHORA RESOLUTION: THE STATE OF THE ART
- Answer is All You Need: Instruction-following Text Embedding via Answering the Question2024
- Answering Questions by Meta-Reasoning over Multiple Chains of Thought2023
- Approaching Human-Level Forecasting with Language Models2024
- Are Emergent Abilities of Large Language Models a Mirage?2023
- AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning2025
- Argument Summarization and its Evaluation in the Era of Large Language Models2025
- Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)2025
- Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models2023
- Ask, and it shall be given: Turing completeness of prompting2024
- Asking Clarifying Questions Based on Negative Feedback in Conversational Search2021
- Assessing adaptive world models in machines with novel games2025
- Assessment of Personality Dimensions Across Situations Using Conversational Speech2025
- Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward2025
- Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models2025
- Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data2025
- Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models2025
- Attribute Controlled Dialogue Prompting2023
- Auditing language models for hidden objectives2025
- Automatic Extraction of Metaphoric Analogies from Literary Texts: Task Formulation, Dataset Construction, and Evaluation2024
- Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data2023
- Automatic Prompt Optimization with "Gradient Descent" and Beam Search2023
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts2020
- Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey2020
- AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders2025
- Backtracing: Retrieving the Cause of the Query2024
- Base Models Know How to Reason, Thinking Models Learn When2025
- Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs2025
- Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback2025
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey2024
- Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models2025
- Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty2025
- Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration2025
- Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing2025
- Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts2025
- Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens2025
- Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate2025
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL2025
- Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think2025
- Beyond the Surface: Probing the Ideological Depth of Large Language Models2025
- Bilevel Autoresearch: Meta-Autoresearching Itself2026
- Boosted Prompt Ensembles for Large Language Models2023
- Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought2023
- Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need2025
- Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond2025
- Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM2024
- Break the Chain: Large Language Models Can be Shortcut Reasoners2024
- Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.2023
- Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions2021
- Building Decision Making Models Through Language Model Regime2024
- Can AI Have a Personality? Prompt Engineering for AI Personality Simulation: A Chatbot Case Study in Gender-Affirming Voice Therapy Training2025
- Can Language Models Solve Graph Problems in Natural Language?2023
- Can Large Language Models Capture Human Annotator Disagreements?2025
- Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess2025
- Can Large Language Models do Analytical Reasoning?2024
- Can large language models explore in-context?2024
- Can Large Language Models Reason and Optimize Under Constraints?2026
- Can Large Language Models Reason and Plan?2024
- Can Large Language Models Understand Argument Schemes?
- Can Large Language Models Understand Context?2024
- Can LLMs Follow Simple Rules?
- Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers2024
- Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?2024
- Can Theoretical Physics Research Benefit from Language Agents?2025
- Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning2025
- Can You Trust LLM Judgments? Reliability of LLM-as-a-Judge2024
- Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models2025
- Causal Claims in Economics2025
- Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning2025
- CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning2025
- CEO: Corpus-based Open-Domain Event Ontology Induction2023
- Chain of Draft: Thinking Faster by Writing Less2025
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
- Chain of Thoughtlessness? An Analysis of CoT in Planning2024
- Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models2023
- Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering2023
- Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective2025
- Chain-of-Retrieval Augmented Generation2025
- Chain-of-Thought Is Not Explainability
- Chain-of-thought Reasoning Is A Policy Improvement Operator2023
- Chain-of-Thought Reasoning Without Prompting2024
- Chain-of-Verification Reduces Hallucination in Large Language Models2023
- Chamain: Harmonizing Character Persona Integrity with Domain-Adaptive Knowledge in Dialogue Generation
- Characterizing Deep Research: A Benchmark and Formal Definition2025
- Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems2024
- ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling2023
- ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs2025
- CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning2025
- Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness2024
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
- Clustering-based Sampling for Few-Shot Cross-Domain Keyphrase Extraction
- CogBench: a large language model walks into a psychology lab2024
- Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity2025
- ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning2025
- Comparing Human and AI Therapists in Behavioral Activation for Depression: Cross-Sectional Questionnaire Study
- Competitive Programming with Large Reasoning Models2025
- Complex Logical Instruction Generation2025
- Complexity-Based Prompting for Multi-Step Reasoning2022
- Compositional Reasoning with Transformers, RNNs, and Chain of Thought2025
- Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning2025
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models2026
- Constructing a Periodic Table of Arguments
- Context Embeddings for Efficient Answer Generation in RAG2024
- Context Tuning for Retrieval Augmented Generation2023
- Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
- Conversational Prompt Engineering2024
- CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
- CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate: A Theory Perspective2025
- CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks2025
- Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making2025
- Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate2025
- CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions2025
- Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains2025
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs2023
- Cumulative Reasoning with Large Language Models2023
- DAPO: An Open-Source LLM Reinforcement Learning System at Scale2025
- DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations2022
- DecepChain: Inducing Deceptive Reasoning in Large Language Models2025
- Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning2024
- Decomposed Prompting: A Modular Approach for Solving Complex Tasks2022
- Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory2025
- Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference2023
- Deep Research: A Systematic Survey2025
- Deep Researcher with Test-Time Diffusion2025
- Deep Think with Confidence2025
- DeepAgent: A General Reasoning Agent with Scalable Toolsets2025
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL2025
- DeepNet: Scaling Transformers to 1,000 Layers2022
- DeepRAG: Thinking to Retrieval Step by Step for Large Language Models2025
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments2025
- DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research2025
- DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning2025
- Demystifying Chains, Trees, and Graphs of Thoughts2024
- Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning2025
- Dense Retrieval Adaptation using Target Domain Description2023
- Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions
- Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces2026
- Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time2025
- Diagnostic Reasoning Prompts Reveal the Potential for Large Language Model Interpretability in Medicine2023
- Dialogue State Tracking with a Language Model using Schema-Driven Prompting
- DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs2025
- DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications2024
- Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning2023
- Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks2025
- Divide-or-Conquer? Which Part Should You Distill Your LLM?2024
- Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?2025
- Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models2024
- Do Large Language Models Latently Perform Multi-Hop Reasoning?2024
- Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?2024
- Do Large Language Models Reason Causally Like Us? Even Better?2025
- Do large language models resemble humans in language use?2023
- Do LLMs Encode Functional Importance of Reasoning Tokens?2026
- Do LLMs Truly Understand When a Precedent Is Overruled?2025
- Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations2023
- Do Prompt-Based Models Really Understand the Meaning of Their Prompts?2021
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust2025
- Do We Trust ChatGPT as much as Google Search and Wikipedia?
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?2025
- Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models2025
- Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey2023
- Domain-specific Question Answering with Hybrid Search2024
- Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?2025
- DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration2025
- DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models2024
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought2024
- Dynamic Prompting: A Unified Framework for Prompt Tuning2023
- DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation2025
- Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining2025
- Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions
- Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge2024
- Efficient Reasoning with Balanced Thinking2026
- Efficient Reasoning with Hidden Thinking2025
- Efficient Tool Use with Chain-of-Abstraction Reasoning2024
- Eliciting Reasoning in Language Models with Cognitive Tools2025
- Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation2025
- Emergent Hierarchical Reasoning In LLMs Through Reinforcement Learning
- EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus2023
- Empowering Domain-Specific Language Models with Graph-Oriented Databases: A Paradigm Shift in Performance and Model Maintenance2024
- Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting2023
- Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate2023
- Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's Advocate
- Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation2023
- Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System2023
- Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy2023
- Evaluating Large Language Models at Evaluating Instruction Following2023
- Evaluating Large Language Models in Exercises of UML Class Diagram Modeling
- Evaluating Large Language Models in Theory of Mind Tasks2023
- Evaluating the Diversity and Quality of LLM Generated Content2025
- Evaluating the False Trust Engendered by LLM Explanations2026
- Evaluating Very Long-Term Conversational Memory of LLM Agents2024
- Evaluation and Benchmarking of LLM Agents: A Survey2025
- Evolving Deeper LLM Thinking2025
- Experimental Design for Active Transductive Inference in Large Language Models2024
- Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy2025
- Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks2025
- Exploring Large Language Models for Knowledge Graph Completion2023
- Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Approaches
- Exploring Student-AI Interactions in Vibe Coding2025
- Exploring the Potential of ChatGPT on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations
- Extracting memorized pieces of (copyrighted) books from open-weight language models2025
- Extrapolation by Association: Length Generalization Transfer in Transformers2025
- Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation2024
- Faith and Fate: Limits of Transformers on Compositionality2023
- FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning2025
- Fine-grained Hallucination Detection and Editing for Language Models2024
- Fine-tuning Language Models for Factuality2023
- Fine-tuning Pre-trained Language Models for Dialogical Argument Mining with Inference Anchoring Theory
- First Try Matters: Revisiting the Role of Reflection in Reasoning Models2025
- FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets2023
- Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models2025
- Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities2024
- FlowReasoner: Reinforcing Query-Level Meta-Agents2025
- Flows: Building Blocks of Reasoning and Collaborating AI2023
- FLOWSTEER: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems2026
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning2024
- FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming2025
- From Context to Skills: Can Language Models Learn from Context Skillfully?2026
- From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step2024
- From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities2024
- From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Models2025
- From Language to Logic: A Bi-Level Framework for Structured Reasoning2025
- From Local to Global: A Graph RAG Approach to Query-Focused Summarization2024
- From Louvain to Leiden: guaranteeing well-connected communities2018
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AI2026
- From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?2025
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation2025
- From Prompt Engineering to Prompt Science With Human in the Loop2024
- From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting2023
- From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents2025
- Further Explorations on the Use of Large Language Models for Thematic Analysis. Open-Ended Prompts, Better Terminologies and Thematic Maps
- FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction2025
- Game-theoretic LLM: Agent Workflow for Negotiation Games2024
- Gdpval: Evaluating Ai Model Performance On Real-world Economically Valuable Tasks
- Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities2025
- Generalization Bias in Large Language Model Summarization of Scientific Research2025
- Generalization to New Sequential Decision Making Tasks with In-Context Learning2023
- Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy2025
- Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?2023
- Generative Recursive Reasoning2026
- Generator-Retriever-Generator: A Novel Approach to Open-domain Question Answering2023
- GPT-4 is judged more human than humans in displaced and inverted Turing tests2024
- Graph of Thoughts: Solving Elaborate Problems with Large Language Models2023
- Graph-enhanced Large Language Models in Asynchronous Plan Reasoning2024
- GRASP: Municipal Budget AI Chatbots for Enhancing Civic Engagement2025
- Grounding Multilingual Multimodal LLMs With Cultural Knowledge2025
- GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models2024
- Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models2024
- Guiding Large Language Models via Directional Stimulus Prompting2023
- Hallucination is Inevitable: An Innate Limitation of Large Language Models2024
- Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools2024
- Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses2026
- Harnessing Business and Media Insights with Large Language Models2024
- Has the Creativity of Large-Language Models peaked? —an analysis of inter- and intra-LLM variability —2025
- Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence2026
- Hierarchical Reasoning Model2025
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches2025
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models2024
- Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis2025
- How do Transformers Learn Implicit Reasoning?2025
- How Far Are We from Genuinely Useful Deep Research Agents?2025
- How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs2024
- How Many Instructions Can LLMs Follow at Once?2025
- How new data permeates LLM knowledge and how to dilute it2025
- How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework2026
- How well can large language models explain business processes?2024
- HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs2024
- Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks2024
- Humanity's Last Exam2025
- Humans or LLMs as the Judge? A Study on Judgement Biases2024
- IFEvalCode: Controlled Code Generation2025
- Implicit Chain of Thought Reasoning via Knowledge Distillation2023
- Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions2025
- Improving Factuality and Reasoning in Language Models through Multiagent Debate2023
- In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss2024
- In-Context Principle Learning from Mistakes2024
- Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs2024
- Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models2025
- Informed Named Entity Recognition Decoding For Generative Language Models2023
- Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey2025
- Inspecting and Editing Knowledge Representations in Language Models2023
- Instance-adaptive Zero-shot Chain-of-Thought Prompting2024
- Instruction Induction: From Few Examples to Natural Language Task Descriptions2022
- Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning2024
- Interactive Evaluation Requires a Design Science2026
- Interesting Scientific Idea Generation Using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders
- Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation2025
- interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification2026
- Intrinsically Motivated Graph Exploration Using Network Theories of Human Curiosity2023
- Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting2023
- Investigating Gender Bias in Language Models Using Causal Mediation Analysis
- Investigating task-specific prompts and sparse autoencoders for activation monitoring2025
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens2025
- Is Cosine-Similarity of Embeddings Really About Similarity?2024
- It's About Time: Incorporating Temporality in Retrieval Augmented Language Models2024
- J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning2025
- JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering2021
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena2023
- KiPT: Knowledge-injected Prompt Tuning for Event Detection
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization2025
- Knowledge Graph Prompting for Multi-Document Question Answering2023
- Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains2025
- Knowledge Retrieval Based on Generative AI2025
- KoLA: Carefully Benchmarking World Knowledge of Large Language Models2023
- Language Agents as Optimizable Graphs2024
- Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought2022
- Language Models Learn to Mislead Humans via RLHF2024
- Language models show human-like content effects on reasoning tasks2022
- Large Causal Models From Large Language Models2025
- Large Language Model Agents Are Not Always Faithful Self-Evolvers2026
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges2024
- Large Language Model Guided Tree-of-Thought2023
- Large Language Model Reasoning Failures2026
- Large Language Model-based Data Science Agent: A Survey2025
- Large Language Models and Knowledge Graphs: Opportunities and Challenges2023
- Large Language Models Are Human-level Prompt Engineers2022
- Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners2023
- Large Language Models as Planning Domain Generators2024
- Large Language Models can accomplish Business Process Management Tasks2023
- Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search2023
- Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities2025
- Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions2023
- Large language models surpass human experts in predicting neuroscience results2024
- Large Language Models Think Too Fast To Explore Effectively2025
- Latent Skill Discovery for Chain-of-Thought Reasoning2023
- Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers2025
- Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments2025
- Learning Retrieval Augmentation for Personalized Dialogue Generation2024
- Learning to Ask Appropriate Questions in Conversational Recommendation2021
- Learning to Ask Critical Questions for Assisting Product Search2024
- Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge2025
- Learning to Reason for Factuality2025
- Learning To Retrieve Prompts for In-Context Learning2021
- Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering2019
- Learning to Select the Relevant History Turns in Conversational Question Answering2023
- Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs2025
- Learning, Fast and Slow: Towards LLMs That Adapt Continually2026
- Least-to-most Prompting Enables Complex Reasoning In Large Language Models2022
- Less is More: Recursive Reasoning with Tiny Networks2025
- LESS: Selecting Influential Data for Targeted Instruction Tuning2024
- Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones2025
- Levels of Analysis for Large Language Models2025
- Leveraging Few-Shot Data Augmentation and Waterfall Prompting for Response Generation2023
- Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset2023
- Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning2023
- LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling2025
- LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries2025
- LLM Augmentations to support Analytical Reasoning over Multiple Documents2024
- LLM Reasoning Is Latent, Not the Chain of Thought2026
- LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory2025
- LLM+P: Empowering Large Language Models with Optimal Planning Proficiency2023
- LLM-Independent Adaptive RAG: Let the Question Speak for Itself2025
- LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools2024
- LLMs as Method Actors: A Model for Prompt Engineering and Architecture2024
- LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring2025
- LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!2025
- LLMs can implicitly learn from mistakes in-context2025
- LLMs Corrupt Your Documents When You Delegate2026
- LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High2025
- Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains2025
- Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning2023
- Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models2024
- Logical Reasoning in Large Language Models: A Survey2025
- LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models2024
- Long-form Factuality In Large Language models2024
- Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning2025
- LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering2024
- LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs2024
- Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers2026
- LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems2025
- LSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following2025
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models2025
- Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning2024
- MasRouter: Learning to Route LLMs for Multi-Agent Systems2025
- Mastering Diverse Domains through World Models2023
- Measuring Faithfulness in Chain-of-Thought Reasoning2023
- Measuring Human Preferences in RLHF is a Social Science Problem2026
- Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models2025
- Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs2026
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications2025
- Memory in the Age of AI Agents: A Survey — Forms, Functions and Dynamics2025
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models2025
- Metacognitive Prompting Improves Understanding in Large Language Models2023
- Metacognitive Retrieval-Augmented Large Language Models2024
- Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors2025
- Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse2024
- Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)2025
- Minds versus Machines: Rethinking Entailment Verification with Language Models2024
- MindSearch: Mimicking Human Minds Elicits Deep AI Searcher2024
- Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning2025
- Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?2025
- Mitigating Hallucinations in Large Language Models via Causal Reasoning2025
- Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say2025
- MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement2025
- Model Organisms for Emergent Misalignment2025
- Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence2024
- Modeling Code: Is Text All You Need?2025
- Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation2025
- Multi-hop Question Answering via Reasoning Chains2019
- Multi-Task End-to-End Training Improves Conversational Recommendation2023
- Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains2025
- MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs2025
- MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries2024
- Natural Emergent Misalignment From Reward Hacking In Production RL
- Navigating the Latent Space Dynamics of Neural Models2025
- NeuroQL: A Neuro-Symbolic Language and Dataset for Inter-Subjective Reasoning2023
- Neurosymbolic AI- Why, What, and How2023
- Neutralizing Bias in LLM Reasoning using Entailment Graphs2025
- News Source Citing Patterns in AI Search Systems2025
- Nexus: An Agentic Framework for Time Series Forecasting2026
- No that's not what I meant: Handling Third Position Repair in Conversational Question Answering2023
- Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance2025
- NoveltyBench: Evaluating Language Models for Humanlike Diversity2025
- O1 Replication Journey: A Strategic Progress Report -- Part 12024
- Octopus v4: Graph of language models2024
- Off-Policy Evaluation for Large Action Spaces via Policy Convolution
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking2025
- On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents2026
- On Predictive planning and counterfactual learning in active inference2024
- On the Impact of Fine-Tuning on Chain-of-Thought Reasoning2024
- On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models2025
- On The Persona-based Summarization of Domain-Specific Documents2024
- On the Reasoning Capacity of AI Models and How to Quantify It2025
- On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs
- On the Theoretical Limitations of Embedding-Based Retrieval2025
- On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting2025
- Open-World Evaluations for Measuring Frontier AI Capabilities2026
- OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning2024
- OpenThoughts: Data Recipes for Reasoning Models2025
- OptimalThinkingBench: Evaluating Over and Underthinking in LLMs2025
- Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution2025
- PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods2024
- Performative Thinking? The Brittle Correlation Between CoT Length and Problem Complexity2025
- Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study2026
- Persistent Pre-Training Poisoning of LLMs2024
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models2025
- Perturbation CheckLists for Evaluating NLG Evaluation Metrics2021
- Pixels, Patterns, but No Poetry: To See The World like Humans2025
- Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts2023
- Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o12024
- Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs2025
- PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking2019
- Position: Towards Bidirectional Human-AI Alignment2024
- Potemkin Understanding in Large Language Models2025
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing2021
- Precise Zero-Shot Dense Retrieval without Relevance Labels2022
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts2025
- Premise Order Matters in Reasoning with Large Language Models2024
- Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs2025
- PRewrite: Prompt Rewriting with Reinforcement Learning2024
- Proactive Conversational Agents with Inner Thoughts2024
- Proactive Human-Machine Conversation with Explicit Conversation Goals2019
- Probing Structured Semantics Understanding and Generation of Language Models via Question Answering2024
- Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games2023
- Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words2022
- Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks2022
- Progressive-Hint Prompting Improves Reasoning in Large Language Models2023
- Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem2026
- Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm2021
- Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution2023
- Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?2024
- ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models2025
- ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs2024
- ProsocialDialog: A Prosocial Backbone for Conversational Agents2022
- ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs2025
- Pushdown Layers: Encoding Recursive Structure in Transformer Language Models2023
- Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability2021
- Quantifying Human-AI Synergy
- Query Rewriting for Retrieval-Augmented Large Language Models2023
- Query Understanding in the Age of Large Language Models2023
- QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?2025
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking2024
- R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning2025
- RAG Does Not Work for Enterprises2024
- RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation2025
- RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism2025
- Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains2025
- Re3: Generating Longer Stories With Recursive Reprompting and Revision2022
- React - Synergizing Reasoning And Acting In Language Models2022
- ReALM: Reference Resolution As Language Modeling2024
- ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs2025
- Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models2026
- Reasoning Can Hurt the Inductive Abilities of Large Language Models2025
- Reasoning Language Models: A Blueprint2025
- Reasoning LLMs are Wandering Solution Explorers2025
- Reasoning Models Are More Easily Gaslighted Than You Think2025
- Reasoning Models Can Be Effective Without Thinking2025
- Reasoning Models Don't Always Say What They Think2025
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025
- Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks2023
- Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?2025
- Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought2026
- Reasoning to Learn from Latent Thoughts2025
- Reasoning with Large Language Models, a Survey2024
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory2025
- ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering2025
- Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training2024
- Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion2025
- Reflexion: Language Agents with Verbal Reinforcement Learning2023
- Reinforcement Learning for Optimizing RAG for Domain Chatbots2024
- Reinforcement Pre-Training2025
- Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models2024
- RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns2025
- Reranking-based Generation for Unbiased Perspective Summarization2025
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning2025
- Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning2025
- Rethinking Interpretability in the Era of Large Language Models2024
- Rethinking Memory as Continuously Evolving Connectivity2026
- Rethinking STS and NLI in Large Language Models
- Retrieval-augmented reasoning with lean language models2025
- RevCore: Review-augmented Conversational Recommendation2021
- Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up2024
- Reverse Thinking Makes LLMs Stronger Reasoners2024
- Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation2025
- Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration2025
- RewardBench: Evaluating Reward Models for Language Modeling2024
- ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models2023
- Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design2025
- RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation2024
- RL + Transformer = A General-Purpose Problem Solver2025
- RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs2025
- RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner2024
- RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems2025
- RLPR: Extrapolating RLVR to General Domains without Verifiers2025
- Role play with large language models
- RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models2023
- Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains2025
- Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs2025
- S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models2025
- SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval2023
- Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models2024
- Scaling Expert Language Models with Unsupervised Domain Discovery2023
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet2026
- Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models2025
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach2025
- Schema-learning and rebinding as mechanisms of in-context learning and emergence2023
- SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs2025
- Search Arena: Analyzing Search-Augmented LLMs2025
- Search-o1: Agentic Search-Enhanced Large Reasoning Models2025
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning2025
- Searching for Best Practices in Retrieval-Augmented Generation2024
- Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
- Self-consistency Improves Chain Of Thought Reasoning In Language Models2022
- Self-critiquing models for assisting human evaluators2022
- Self-Discover: Large Language Models Self-Compose Reasoning Structures2024
- SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions2022
- Self-Organizing Graph Reasoning Evolves into a Critical State for Continuous Discovery Through Structural-Semantic Dynamics2025
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection2023
- Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?2025
- Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue2024
- Simple Synthetic Data Reduces Sycophancy In Large Language Models2023
- Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models2023
- Sleep-time Compute: Beyond Inference Scaling at Test-time2025
- Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space2025
- Soft Tokens, Hard Truths2025
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs2025
- Sources of Hallucination by Large Language Models on Inference Tasks
- SParC: Cross-Domain Semantic Parsing in Context
- Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations2026
- SpreadsheetLLM: Encoding Spreadsheets for Large Language Models2024
- Spurious Forgetting in Continual Learning of Language Models2025
- SSRL: Self-Search Reinforcement Learning2025
- Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!2025
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models2025
- StoryScope: Investigating idiosyncrasies in AI fiction2026
- Strategic Reasoning with Language Models2023
- Stream of Search (SoS): Learning to Search in Language2024
- Stress Testing Deliberative Alignment for Anti-Scheming Training2025
- StructGPT: A General Framework for Large Language Model to Reason over Structured Data2023
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization2024
- Structured and Natural Responses Co-generation for Conversational Search
- Style Vectors for Steering Generative Large Language Models
- Subliminal Learning: Language models transmit behavioral traits via hidden signals in data2025
- Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning2025
- Survey on Evaluation of LLM-based Agents2025
- Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories2025
- Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence2025
- SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs2025
- Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models2023
- System 2 Attention (is something you might need too)2023
- Systematic synthesis of design prompts for large language models in conceptual design
- Talk like a Graph: Encoding Graphs for Large Language Models2023
- Talking About Large Language Models2022
- Task Contamination: Language Models May Not Be Few-Shot Anymore2023
- TaskLAMA: Probing the Complex Task Understanding of Language Models2023
- Teaching Large Language Models to Reason with Reinforcement Learning2024
- Test-time Prompt Intervention2025
- Test-Time Scaling with Reflective Generative Model2025
- The AI Hippocampus: How Far are We From Human Memory?2026
- The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs2025
- The Consensus Game: Language Model Generation via Equilibrium Search2023
- The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think2025
- The Hallucination Tax of Reinforcement Finetuning2025
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs2025
- The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs2025
- The Illusion of the Illusion of the Illusion of Thinking
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- The Impossibility of Fair LLMs2024
- The Incomplete Bridge: How AI Research (Mis)Engages with Psychology2025
- The Insanity of Relying on Vector Embeddings: Why RAG Fails
- The Invisible Leash: Why RLVR May Not Escape Its Origin2025
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning2026
- The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning2026
- The Prompt Report: A Systematic Survey of Prompting Techniques2024
- The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"2023
- The Vanishing Gradient Problem for Stiff Neural Differential Equations2025
- TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks2024
- Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models2025
- Think before you speak: Training Language Models With Pause Tokens2023
- Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens2026
- Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods2025
- Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection2024
- Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph2023
- Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor2026
- Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering2023
- Thinking Augmented Pre-training2025
- Thinking Forward and Backward: Effective Backward Planning with Large Language Models2024
- Thinkless: LLM Learns When to Think2025
- Thought Anchors: Which LLM Reasoning Steps Matter?2025
- Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems2026
- Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs2025
- Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines2025
- Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation2024
- Tina: Tiny Reasoning Models via LoRA2025
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning2024
- Topic Shift Detection for Mixed Initiative Response
- Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties2025
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing2024
- Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models2025
- Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models2023
- Towards a Science of Scaling Agent Systems2025
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs2025
- Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?2020
- Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control2024
- Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models2025
- Towards Understanding Counseling Conversations: Domain Knowledge and Large Language Models2024
- Training for Compositional Sensitivity Reduces Dense Retrieval Generalization2026
- Training language models to be warm and empathetic makes them less reliable and more sycophantic2025
- Training Language Models to Self-Correct via Reinforcement Learning2024
- Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning2024
- Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis2024
- Transcendence: Generative Models Can Outperform The Experts That Train Them2024
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models2023
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search2025
- Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models2025
- Tuning Language Models by Proxy
- Turiya at DialAM-2024: Inference Anchoring Theory Based LLM Parsers
- Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering2025
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models2024
- Unifying Large Language Models and Knowledge Graphs: A Roadmap2023
- UniGraph: Learning a Unified Cross-Domain Foundation Model for Text-Attributed Graphs2024
- Universe of Thoughts: Enabling Creative Reasoning with Large Language Models2025
- Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem2025
- Unsupervised Elicitation of Language Models
- Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study2025
- UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation2023
- UR2: Unify RAG and Reasoning through Reinforcement Learning2025
- User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal2025
- Using LLMs to Discover Legal Factors2024
- VCBench: Benchmarking LLMs in Venture Capital2025
- Vector Policy Optimization: Training for Diversity Improves Test-Time Search2026
- Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity2025
- VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild2026
- Virtual Assistance in Any Context
- Virtuous Machines: Towards Artificial General Science2025
- Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation
- What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT2025
- What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity2025
- What is a Discourse Graph?
- What Makes a Good Natural Language Prompt?2025
- When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs2025
- When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models2024
- When More is Less: Understanding Chain-of-Thought Length in LLMs2025
- When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions2025
- When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection2025
- When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs2025
- Why Do Multi-agent LLM Systems Fail?
- Why Do Some Language Models Fake Alignment While Others Don't?2025
- Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?2026
- You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures2025
- ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning2025
- Zero-Shot Verification-guided Chain of Thoughts2025
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching2025
- 𝙻𝙼𝟸: A Simple Society of Language Models Solves Complex Reasoning2024
Training, RL, and Test-Time Scaling418↑ top
- 100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models2025
- 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities2025
- A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems2025
- A Decomposition Perspective to Long-context Reasoning for LLMs2026
- A Little Human Data Goes A Long Way2024
- A Primer on the Inner Workings of Transformer-based Language Models2024
- A Survey of Calibration Process for Black-Box LLMs2024
- A Survey of Continual Reinforcement Learning2025
- A Survey of Reinforcement Learning from Human Feedback2023
- A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence2025
- A Survey on Knowledge Distillation of Large Language Models2024
- A Survey on LLM Inference-Time Self-Improvement2024
- A Survey on Post-training of Large Language Models2025
- A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?2025
- Absolute Zero: Reinforced Self-play Reasoning with Zero Data2025
- Adapting LLM Agents with Universal Feedback in Communication2023
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling2025
- Agent Learning via Early Experience2025
- Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training2025
- AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation2024
- Agentic Systems as Boosting Weak Reasoning Models2026
- AI Can Learn Scientific Taste2026
- AlphaGo Moment for Model Architecture Discovery2025
- An Emulator for Fine-Tuning Large Language Models using Small Language Models2023
- AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning2025
- ARGS: Alignment as Reward-Guided Search2024
- Artifacts as Memory Beyond the Agent Boundary2026
- Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework2022
- Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models2025
- Atom of Thoughts for Markov LLM Test-Time Scaling2025
- Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward2025
- Auditing language models for hidden objectives2025
- Augmenting Autotelic Agents with Large Language Models2023
- Autogenesis: A Self-Evolving Agent Protocol2026
- Automated Alignment Researchers: Using large language models to scale scalable oversight2022
- Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies2023
- Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey2020
- Base Models Know How to Reason, Thinking Models Learn When2025
- Behavioral Exploration: Learning to Explore via In-Context Adaptation2025
- Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models2025
- Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty2025
- Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment2025
- Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning2025
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL2025
- Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning2025
- Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR2025
- Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following2025
- Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
- Bilevel Autoresearch: Meta-Autoresearching Itself2026
- Boundless Socratic Learning with Language Games2024
- Branch-Solve-Merge Improves Large Language Model Evaluation and Generation2023
- Bridging Offline and Online Reinforcement Learning for LLMs2025
- Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning2023
- Can Large Language Models Capture Human Annotator Disagreements?2025
- Can Large Language Models Really Improve by Self-critiquing Their Own Plans?2023
- Can Large Reasoning Models Self-Train?2025
- Can LLM be a Personalized Judge?2024
- Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?2024
- Chain-of-thought Reasoning Is A Policy Improvement Operator2023
- Chain-of-Thought Reasoning Without Prompting2024
- Chain-of-Verification Reduces Hallucination in Large Language Models2023
- Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference2024
- Checklists Are Better Than Reward Models For Aligning Language Models2025
- Command A: An Enterprise-Ready Large Language Model2025
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning2025
- Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning2023
- Continual Instruction Tuning for Large Multimodal Models2023
- CONTROL PREFIXES for Parameter-Efficient Text Generation2021
- Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
- CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks2025
- Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback2025
- Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains2025
- CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning2018
- DAPO: An Open-Source LLM Reinforcement Learning System at Scale2025
- Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents2025
- Decision Transformer: Reinforcement Learning via Sequence Modeling2021
- Deep Researcher with Test-Time Diffusion2025
- Deep Think with Confidence2025
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments2025
- DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning2025
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
- DeLLMa: Decision Making Under Uncertainty with Large Language Models2024
- Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP2022
- Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning2025
- Dialogue State Tracking with a Language Model using Schema-Driven Prompting
- DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs2025
- Diffusion Models are Evolutionary Algorithms2024
- Direct Language Model Alignment from Online AI Feedback2024
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model2023
- Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks2025
- Distilling LLMs' Decomposition Abilities into Compact Language Models2024
- Divide-or-Conquer? Which Part Should You Distill Your LLM?2024
- Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models2024
- Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning2023
- Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?2024
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?2025
- Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models2025
- Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?2025
- DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research2025
- DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning2026
- Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models2024
- Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining2025
- Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge2024
- Efficient Reinforcement Learning via Large Language Model-based Search2024
- Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs2024
- Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation2025
- Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers2024
- Emergent Hierarchical Reasoning In LLMs Through Reinforcement Learning
- Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance2025
- End-to-End Test-Time Training for Long Context2025
- Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision2024
- Escaping the Verifier: Learning to Reason via Demonstrations2025
- Evaluating Large Language Models at Evaluating Instruction Following2023
- Evaluating the Diversity and Quality of LLM Generated Content2025
- Evolving Deeper LLM Thinking2025
- Exploring Format Consistency for Instruction Tuning2023
- External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling
- Extreme Multi-Label Skill Extraction Training using Large Language Models2023
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs2026
- Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models2025
- Fine-grained Hallucination Detection and Editing for Language Models2024
- Fine-tuning Language Models for Factuality2023
- Fine-tuning Large Language Model for Automated Algorithm Design2025
- First Try Matters: Revisiting the Role of Reflection in Reasoning Models2025
- FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets2023
- FlowReasoner: Reinforcing Query-Level Meta-Agents2025
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions2024
- Foundations of Large Language Models2025
- From Context to Skills: Can Language Models Learn from Context Skillfully?2026
- From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models2024
- From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence2026
- From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR2025
- Generating Query-Relevant Document Summaries via Reinforcement Learning2025
- Generative Recursive Reasoning2026
- GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning2025
- GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning2025
- Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning2023
- Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents2026
- Hogwild! Inference: Parallel LLM Generation via Concurrent Attention2025
- How Many Instructions Can LLMs Follow at Once?2025
- How Should We Meta-Learn Reinforcement Learning Algorithms?2025
- How to Correctly do Semantic Backpropagation on Language-based Agentic Systems2024
- Hyperagents2026
- Improving large language models with concept-aware fine-tuning2025
- Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards2024
- Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models2025
- Inference-Time Scaling for Generalist Reward Modeling2025
- Information-Theoretic Reward Decomposition for Generalizable RLHF2025
- Instruction Tuning for Large Language Models: A Survey2023
- Instruction-tuned Language Models are Better Knowledge Learners2024
- Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning2024
- interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification2026
- Intrinsic Credit Assignment for Long Horizon Interaction2026
- Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data2024
- J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning2025
- Jointly Reinforcing Diversity and Quality in Language Model Generations2025
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena2023
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization2025
- KTO: Model Alignment as Prospect Theoretic Optimization2024
- Language Model Personalization via Reward Factorization2025
- Language Modeling by Language Models2025
- Large Language Model Agents Are Not Always Faithful Self-Evolvers2026
- Large Language Models: A Survey2024
- Learn from your own latents and not from tokens: A sample-complexity theory2026
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries2025
- Learning to (Learn at Test Time): RNNs with Expressive Hidden States2024
- Learning to Discover at Test Time2026
- Learning to Learn from Language Feedback with Social Meta-Learning2026
- Learning to Reason for Factuality2025
- Learning to Reason without External Rewards2025
- Learning To Retrieve Prompts for In-Context Learning2021
- Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs2025
- Learning, Fast and Slow: Towards LLMs That Adapt Continually2026
- LESS: Selecting Influential Data for Targeted Instruction Tuning2024
- Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones2025
- Let’s Verify Step by Step2023
- Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity2022
- Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways2023
- LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling2025
- Linguistic Calibration of Long-Form Generations2024
- LLM Post-Training: A Deep Dive into Reasoning Large Language Models2025
- LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities2025
- LLMs can be Fooled into Labelling a Document as Relevant
- LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!2025
- LLMs can implicitly learn from mistakes in-context2025
- Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains2025
- Long-context LLMs Struggle with Long In-context Learning2024
- LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards2026
- Look Before You Leap: Autonomous Exploration for LLM Agents2026
- Looking beyond the next token2025
- Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers2026
- Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs2024
- LSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following2025
- Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing2024
- Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning2024
- MatFormer: Nested Transformer for Elastic Inference2023
- MaxMin-RLHF: Alignment with Diverse Human Preferences2024
- Measuring Human Preferences in RLHF is a Social Science Problem2026
- Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models2025
- Mechanisms of Introspective Awareness2026
- Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs2026
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications2025
- Memorization and Knowledge Injection in Gated LLMs2025
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models2025
- Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge2024
- MetaClaw: Just Talk — An Agent That Meta-Learns and Evolves in the Wild2026
- Metacognitive Retrieval-Augmented Large Language Models2024
- Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models2024
- Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning2025
- Misaligned by Design: Incentive Failures in Machine Learning2025
- Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?2025
- Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models2023
- MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement2025
- MLLM-CBench: A Comprehensive Benchmark for Continual Instruction Tuning of Multimodal LLMs with Chain-of-Thought Reasoning Analysis2025
- MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases2024
- MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation2026
- Natural Emergent Misalignment From Reward Hacking In Production RL
- Natural Emergent Misalignment From Reward Hacking In Production Rl2025
- NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions2025
- Nested Learning: The Illusion of Deep Learning Architectures2025
- Nested Learning: The Illusion of Deep Learning Architectures
- Neutralizing Bias in LLM Reasoning using Entailment Graphs2025
- Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance2025
- OMNI-SIMPLEMEM: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory2026
- Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling2025
- On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents2026
- On the Impact of Fine-Tuning on Chain-of-Thought Reasoning2024
- On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting2025
- Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback2024
- OpenClaw-RL: Train Any Agent Simply by Talking2026
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning2024
- OpenThoughts: Data Recipes for Reasoning Models2025
- Orchestrating Synthetic Data with Reasoning
- Outcome-based Exploration for LLM Reasoning2025
- Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning2025
- Persistent Pre-Training Poisoning of LLMs2024
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models2025
- Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback2023
- Personalized Language Modeling from Personalized Human Feedback2024
- Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning2024
- Planning Like Human: A Dual-process Framework for Dialogue Planning2024
- Post-Completion Learning for Language Models2025
- Post-Training Large Language Models via Reinforcement Learning from Self-Feedback2025
- Post-training makes large language models less human-like2026
- Pre-Trained Policy Discriminators are General Reward Models2025
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- PretrainZero: Reinforcement Active Pretraining2025
- Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models2024
- Process Reward Models That Think2025
- Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution2023
- ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models2025
- Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics2023
- R-Zero: Self-Evolving Reasoning LLM from Zero Data2025
- RARR: Researching and Revising What Language Models Say, Using Language Models2022
- Real-Time Procedural Learning From Experience for AI Agents2025
- ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs2025
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory2025
- Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning2025
- Recursive Introspection: Teaching Language Model Agents How to Self-Improve2024
- Recursive Language Models2025
- Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion2025
- Reflexion: Language Agents with Verbal Reinforcement Learning2023
- ReFT: Representation Finetuning for Language Models2024
- Reinforced Attention Learning2026
- Reinforced Language Models for Sequential Decision Making2025
- Reinforcement Learning be Enough for Thinking?2025
- Reinforcement Learning Finetunes Small Subnetworks in Large Language Models2025
- Reinforcement Learning for Optimizing RAG for Domain Chatbots2024
- Reinforcement Learning for Reasoning in Large Language Models with One Training Example2025
- Reinforcement Learning via Self-Distillation2026
- Reinforcement Learning with Rubric Anchors2025
- Reinforcement Learning: An Overview2024
- Reinforcement Pre-Training2025
- Reinforcing General Reasoning without Verifiers2025
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning2025
- Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning2025
- Rethinking Thinking Tokens: LLMs as Improvement Operators2025
- Rethinking with Retrieval: Faithful Large Language Model Inference2022
- Retrieval-augmented reasoning with lean language models2025
- Reverse Thinking Makes LLMs Stronger Reasoners2024
- Revisiting LLM Reasoning via Information Bottleneck2025
- Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?2025
- Reward Reasoning Model2025
- Reward-Robust RLHF in LLMs2024
- RewardBench: Evaluating Reward Models for Language Modeling2024
- Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment2024
- RL + Transformer = A General-Purpose Problem Solver2025
- RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs2025
- RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization2025
- RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner2024
- RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems2025
- RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback2023
- RLHF Workflow: From Reward Modeling to Online RLHF2024
- RLNVR: Reinforcement Learning from Non-Verified Real-World Rewards2025
- RLP: Reinforcement as a Pretraining Objective2025
- RLPR: Extrapolating RLVR to General Domains without Verifiers2025
- RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents2025
- RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents2025
- RM-R1: Reward Modeling as Reasoning2025
- rStar2-Agent: Agentic Reasoning Technical Report2025
- Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains2025
- S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models2025
- Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning2025
- SAND: Boosting LLM Agents with Self-Taught Action Deliberation2025
- Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search2025
- Scaling Expert Language Models with Unsupervised Domain Discovery2023
- Scaling Laws for Agent Harnesses via Effective Feedback Compute2026
- Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs2025
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters2024
- Self-Adapting Language Models2025
- Self-critiquing models for assisting human evaluators2022
- Self-Discover: Large Language Models Self-Compose Reasoning Structures2024
- Self-distillation Enables Continual Learning2026
- Self-Evaluation Guided Beam Search for Reasoning2023
- Self-Improving Model Steering2025
- Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges2025
- Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models2024
- Self-Questioning Language Models2025
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection2023
- Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst2025
- Self-Refine: Iterative Refinement with Self-Feedback2023
- Self-Reflection in LLM Agents: Effects on Problem-Solving Performance2024
- Self-Rewarding Language Models2024
- Self-Taught Evaluators2024
- SERL: Self-Examining Reinforcement Learning on Open-Domain2025
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training2025
- Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO2025
- Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue2024
- SimPO: Simple Preference Optimization with a Reference-Free Reward2024
- SkillClaw: Let Skills Evolve Collectively with Agentic Evolver2026
- SkillOpt: Executive Strategy for Self-Evolving Agent Skills2026
- SkillOS: Learning Skill Curation for Self-Evolving Agents2026
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning2026
- Sleep-time Compute: Beyond Inference Scaling at Test-time2025
- Soft Tokens, Hard Truths2025
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models2025
- Spurious Rewards: Rethinking Training Signals in RLVR
- SSRL: Self-Search Reinforcement Learning2025
- STaR-GATE: Teaching Language Models to Ask Clarifying Questions2024
- Statistical and Algorithmic Foundations of Reinforcement Learning2025
- SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF2023
- StepWiser: Stepwise Generative Judges for Wiser Reasoning2025
- Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!2025
- Stream of Search (SoS): Learning to Search in Language2024
- Supervised Pretraining Can Learn In-Context Reinforcement Learning2023
- Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning2025
- Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories2025
- Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager2025
- TarGEN: Targeted Data Generation with Large Language Models2023
- Teaching Large Language Models to Reason with Reinforcement Learning2024
- Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future2025
- Test-time Prompt Intervention2025
- Test-Time Scaling with Reflective Generative Model2025
- The Art of Scaling Reinforcement Learning Compute for LLMs2025
- The Curse Of Recursion: Training On Generated Data Makes Models Forget2023
- The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models2025
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits2024
- The False Promise of Imitating Proprietary LLMs2023
- The Hallucination Tax of Reinforcement Finetuning2025
- The Invisible Leash: Why RLVR May Not Escape Its Origin2025
- The Landscape of Agentic Reinforcement Learning for LLMs: A Survey2025
- The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning2025
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning2024
- The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities2024
- Think before you speak: Training Language Models With Pause Tokens2023
- Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods2025
- Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models2025
- Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection2024
- Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking2025
- Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor2026
- Thinking Augmented Pre-training2025
- Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction2025
- TiMoE: Time-Aware Mixture of Language Experts2025
- Titans: Learning to Memorize at Test Time2024
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing2024
- Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities2025
- Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models2025
- Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought2025
- Train Long, Think Short: Curriculum Learning for Efficient Reasoning2025
- Training a Generally Curious Agent2025
- Training for Compositional Sensitivity Reduces Dense Retrieval Generalization2026
- Training language models to follow instructions with human feedback2022
- Training Language Models to Self-Correct via Reinforcement Learning2024
- Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning2024
- Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning2025
- Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis2024
- Training-Free Group Relative Policy Optimization2025
- Transcendence: Generative Models Can Outperform The Experts That Train Them2024
- Transformer2: Self-adaptive LLMs2025
- Tree Search for Language Model Agents2024
- Tree Search for LLM Agent Reinforcement Learning2025
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search2025
- Truly Self-Improving Agents Require Intrinsic Metacognitive Learning2025
- TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning2025
- TTRL: Test-Time Reinforcement Learning2025
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training2024
- Tuning Language Models by Proxy
- Understanding and Mitigating Premature Confidence for Better LLM Reasoning2026
- Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting2025
- Understanding Tool-Integrated Reasoning2025
- Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem2025
- Unsupervised Elicitation of Language Models
- Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study2025
- UR2: Unify RAG and Reasoning through Reinforcement Learning2025
- User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal2025
- Using Natural Language for Reward Shaping in Reinforcement Learning2019
- Vector Policy Optimization: Training for Diversity Improves Test-Time Search2026
- Voyager: An Open-Ended Embodied Agent with Large Language Models2023
- When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models2024
- When More is Less: Understanding Chain-of-Thought Length in LLMs2025
- When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method2024
- Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?2026
- Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention2026
- Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards2025
- ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning2025
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching2025
- 𝙻𝙼𝟸: A Simple Society of Language Models Solves Complex Reasoning2024
Agentic Systems and Tool Use283↑ top
- A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems2025
- A Domain Specific Modeling Language for Multiagent Systems
- A Framework for Collaborating a Large Language Model Tool in Brainstorming for Triggering Creative Thoughts2024
- A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows2025
- A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence2025
- A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions2024
- Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems2021
- Adaptation of Agentic AI
- Adapter-based Selective Knowledge Distillation for Federated Multi-domain Meeting Summarization2023
- Adapting LLM Agents with Universal Feedback in Communication2023
- Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems2025
- Agent A/B: Automated and Scalable A/B Testing on Live Websites with Interactive LLM Agents2025
- Agent Development Kit
- Agent Laboratory: Using LLM Agents as Research Assistants2025
- Agent S: An Open Agentic Framework that Uses Computers Like a Human2024
- Agent Workflow Memory2024
- Agent-as-a-Judge: Evaluate Agents with Agents2024
- Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models2025
- AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs2025
- AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation2024
- Agentic AI and the next intelligence explosion2026
- Agentic Code Reasoning2026
- Agentic Reasoning for Large Language Models2026
- Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research2025
- Agentic Web: Weaving the Next Web with AI Agents2025
- AgentRxiv: Towards Collaborative Autonomous Research2025
- Agents Are Not Enough2024
- Agents of Chaos2026
- AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs2025
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
- AI Agent Traps
- AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges2025
- AI Can Learn Scientific Taste2026
- AI Compute Architecture and Evolution Trends2025
- AI for Auto-Research: Roadmap & User Guide2026
- aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists2025
- An Empirical Study of GPT-4o Image Generation Capabilities2025
- Apollo's Oracle: Retrieval-Augmented Reasoning in Multi-Agent Debates2023
- Artificial Intelligence and the Labor Market∗
- ASI-Evolve: AI Accelerates AI2026
- Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models2025
- AutoGLM: Autonomous Foundation Agents for GUIs2024
- Automated Design of Agentic Systems2024
- AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration2026
- AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation2026
- Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey2020
- Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling2024
- Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration2025
- Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing2025
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL2025
- Bilevel Autoresearch: Meta-Autoresearching Itself2026
- Branch-Solve-Merge Improves Large Language Model Evaluation and Generation2023
- Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.2023
- BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent2025
- Building Cooperative Embodied Agents Modularly with Large Language Models2023
- CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society2023
- Can AI Agents Agree?2026
- Can Large Language Models Really Improve by Self-critiquing Their Own Plans?2023
- Can Large Language Models Reason and Plan?2024
- Causal Reflection with Language Models2025
- Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems2024
- ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate2023
- CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization2023
- Code as Agent Harness2026
- Collaborative Reasoner: Self-Improving Social Agents with Synthetic Conversations
- Context Engineering 2.0: The Context of Context Engineering2025
- Conversational Semantic Parsing for Dialog State Tracking2020
- CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
- CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions2025
- Decision Transformer: Reinforcement Learning via Sequence Modeling2021
- Decision-Oriented Dialogue for Human–AI Collaboration2023
- Deep Research: A Systematic Survey2025
- DeepAgent: A General Reasoning Agent with Scalable Toolsets2025
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL2025
- DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research2025
- DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents2023
- Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources2023
- Dialogue Transformers2019
- Do Phone-Use Agents Respect Your Privacy?2026
- Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook2026
- Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration2024
- DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration2025
- Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures2026
- Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization2023
- Dynamic Planning with a LLM
- Efficient Tool Use with Chain-of-Abstraction Reasoning2024
- Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate2023
- Equipping agents for the real world with Agent Skills
- Estimating AI productivity gains from Claude conversations
- Evaluating Theory of Mind and Internal Beliefs in LLM-Based Multi-Agent Systems2026
- Everything Everywhere All At Once: Llms Can In-context Learn Multiple Tasks In Superposition2024
- Evolving Deeper LLM Thinking2025
- Explainable Compliance Detection with Multi-Hop Natural Language Inference on Assurance Case Structure2025
- Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks2025
- Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Approaches
- Exploring Student-AI Interactions in Vibe Coding2025
- Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review2024
- Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering2026
- Fast, Slow, and Tool-augmented Thinking for LLMs: A Review2025
- Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI2025
- FlowMind: Automatic Workflow Generation with LLMs2024
- FlowReasoner: Reinforcing Query-Level Meta-Agents2025
- FLOWSTEER: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems2026
- Foundation Protocol: A Coordination Layer for Agentic Society2026
- From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications2025
- From Context to Skills: Can Language Models Learn from Context Skillfully?2026
- From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models2024
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AI2026
- From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents2025
- Fundamentals of Building Autonomous LLM Agents2025
- Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce2025
- Gdpval: Evaluating Ai Model Performance On Real-world Economically Valuable Tasks
- Generalization to New Sequential Decision Making Tasks with In-Context Learning2023
- Generating Query-Relevant Document Summaries via Reinforcement Learning2025
- Generative Agent Simulations of 1,000 People2024
- Generative Agents: Interactive Simulacra of Human Behavior2023
- Generative AI in Real-World Workplaces
- Generative Interfaces for Language Models2025
- Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks2024
- Graph-enhanced Large Language Models in Asynchronous Plan Reasoning2024
- GRASP: Municipal Budget AI Chatbots for Enhancing Civic Engagement2025
- Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models2024
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches2025
- How AI Impacts Skill Formation2026
- How Exposed Are UK Jobs to Generative AI? Developing and Applying a Novel Task-Based Index2025
- How Far Are We from Genuinely Useful Deep Research Agents?2025
- How we built our multi-agent research system
- Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing2024
- IFEvalCode: Controlled Code Generation2025
- Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans2023
- Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks2024
- Insert-expansions For Tool-enabled Conversational Agents2023
- Intelligent AI Delegation2026
- Interactive Evaluation Requires a Design Science2026
- interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification2026
- Intrinsic Credit Assignment for Long Horizon Interaction2026
- KellyBench: Can Language Models Beat the Market?
- Language Agents as Optimizable Graphs2024
- Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration2020
- Large Action Models: From Inception to Implementation2024
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges2024
- Large Language Model-based Data Science Agent: A Survey2025
- Large Language Model-Brained GUI Agents: A Survey2024
- Large Language Models as Planning Domain Generators2024
- Large Language Models can accomplish Business Process Management Tasks2023
- Large Multimodal Agents: A Survey2024
- Latent Collaboration in Multi-Agent Systems2025
- Learning "Partner-Aware" Collaborators in Multi-Party Collaboration2025
- Learning Human-Object Interaction as Groups2025
- Learning to Map Context-Dependent Sentences to Executable Formal Queries2018
- Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge2025
- LESS: Selecting Influential Data for Targeted Instruction Tuning2024
- Levels of AI Agents: from Rules to Large Language Models2024
- Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning2023
- LIMI: Less is More for Agency2025
- LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries2025
- LLM+P: Empowering Large Language Models with Optimal Planning Proficiency2023
- LLMs as Architects and Critics for Multi-Source Opinion Summarization2025
- LLMs Corrupt Your Documents When You Delegate2026
- Magentic-UI: Towards Human-in-the-loop Agentic Systems2025
- MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving2025
- MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization2025
- MasRouter: Learning to Route LLMs for Multi-Agent Systems2025
- MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch2025
- Measuring Agents in Production2025
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications2025
- Memory in the Age of AI Agents: A Survey — Forms, Functions and Dynamics2025
- MetaClaw: Just Talk — An Agent That Meta-Learns and Evolves in the Wild2026
- Metagpt: Meta Programming For Multi-agent Collaborative Framework2023
- MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems2025
- MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement2025
- Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence2024
- MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections2025
- Multi-agent cooperation through in-context co-player inference2026
- Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?2026
- Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation2025
- Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains2025
- MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation2026
- Navigating the State of Cognitive Flow: Context-Aware AI Interventions for Effective Reasoning Support2025
- News Sentiment Embeddings for Stock Price Forecasting2025
- Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction2025
- Nexus: An Agentic Framework for Time Series Forecasting2026
- Octopus v2: On-device language model for super agent2024
- Octopus v4: Graph of language models2024
- OMNI-SIMPLEMEM: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory2026
- OmniParser for Pure Vision Based GUI Agent2024
- On the Limits of Innate Planning in Large Language Models2025
- On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs
- Openagents: An Open Platform For Language Agents In The Wild2023
- OpenClaw-RL: Train Any Agent Simply by Talking2026
- OpinionConv: Conversational Product Search with Grounded Opinions2023
- Opportunities for large language models and discourse in engineering design
- PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing2026
- Peer-Preservation in Frontier Models
- PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods2024
- Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study2026
- Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts2023
- Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o12024
- Planning Like Human: A Dual-process Framework for Dialogue Planning2024
- PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers2024
- Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents2023
- PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking2019
- Position: LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks2024
- ProAgent: Building Proactive Cooperative Agents with Large Language Models2023
- ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs2025
- QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks2026
- R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning2025
- RAG Does Not Work for Enterprises2024
- React - Synergizing Reasoning And Acting In Language Models2022
- Real-Time Procedural Learning From Experience for AI Agents2025
- Real-World Planning with PDDL+ and Beyond
- Reasoning-Driven Synthetic Data Generation and Evaluation2026
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory2025
- Reinforced Language Models for Sequential Decision Making2025
- Reinforcement Learning for Optimizing RAG for Domain Chatbots2024
- Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents2024
- Rethinking Memory as Continuously Evolving Connectivity2026
- Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration2025
- ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models2023
- RouteLLM: Learning to Route LLMs with Preference Data2024
- SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval2023
- Scaling Behavior of Single LLM-Driven Multi-Agent Systems2026
- ScreenAI: A Vision-Language Model for UI and Infographics Understanding2024
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning2025
- Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems2023
- Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent2024
- Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets2026
- Single-agent or Multi-agent Systems? Why Not Both?2025
- SkillClaw: Let Skills Evolve Collectively with Agentic Evolver2026
- SkillOpt: Executive Strategy for Self-Evolving Agent Skills2026
- SkillOS: Learning Skill Curation for Self-Evolving Agents2026
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning2026
- Small Language Models are the Future of Agentic AI2025
- Small LLMs Are Weak Tool Learners: A Multi-LLM Agent2024
- Social Skill Training with Large Language Models
- SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching
- Solving a Million-Step LLM Task with Zero Errors2025
- SpreadsheetLLM: Encoding Spreadsheets for Large Language Models2024
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization2024
- Survey on Evaluation of LLM-based Agents2025
- Task Contamination: Language Models May Not Be Few-Shot Anymore2023
- Task-Oriented Dialogue as Dataflow Synthesis2020
- Task-Oriented Dialogue with In-Context Learning2024
- TaskLAMA: Probing the Complex Task Understanding of Language Models2023
- TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation2024
- The AI Hippocampus: How Far are We From Human Memory?2026
- The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation2024
- The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas2025
- The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
- The impact of generative artificial intelligence on socioeconomic inequalities and policy making
- The Labor Market Effects of Generative Artificial Intelligence
- The Landscape of Agentic Reinforcement Learning for LLMs: A Survey2025
- The state of enterprise AI
- TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks2024
- Thinking Forward and Backward: Effective Backward Planning with Large Language Models2024
- Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction2025
- Thought Communication in Multiagent Collaboration2025
- Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems2026
- TnT-LLM: Text Mining at Scale with Large Language Models2024
- ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis2024
- Toward Efficient Agents: A Survey of Memory, Tool Learning, and Planning2026
- Towards a Science of Scaling Agent Systems2025
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs2025
- Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning2025
- Training a Generally Curious Agent2025
- Tree Search for Language Model Agents2024
- Tree Search for LLM Agent Reinforcement Learning2025
- Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents2024
- TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models2024
- UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity2024
- Unleashing Cognitive Synergy In Large Language Models: A Task-solving Agent Through Multi-persona Self-collaboration2023
- Useful Memories Become Faulty When Continuously Updated by LLMs2026
- UserBench: An Interactive Gym Environment for User-Centric Agents2025
- Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies2023
- Using LLMs to Discover Legal Factors2024
- VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild2026
- Voyager: An Open-Ended Embodied Agent with Large Language Models2023
- We Wont be Missed: Work and Growth in the Era of AGI
- When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs2025
- Why Do Multi-agent LLM Systems Fail?
- Working with AI: Measuring the Occupational Implications of Generative AI2025
- Workplace Everyday-Creativity through a Highly-Conversational UI to Large Language Models
Model Architecture and Internals365↑ top
- A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems2025
- A framework for the use of generative modelling in non-equilibrium statistical mechanics2024
- A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts2024
- A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis2023
- A polar coordinate system represents syntax in large language models2024
- A Survey of Context Engineering for Large Language Models2025
- A Survey on Diffusion Language Models2025
- A Survey on Large Language Models with some Insights on their Capabilities and Limitations2025
- ACE: Abstractions for Communicating Efficiently2024
- Activation Steering for Chain-of-Thought Compression2025
- Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs2025
- Agent Development Kit
- Agent Learning via Early Experience2025
- Agent Workflow Memory2024
- Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models2025
- AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs2025
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models2025
- Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research2025
- AI Agent Traps
- AI Agents Need Memory Control Over More Context2026
- AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms2025
- All AI Models are Wrong, but Some are Optimal2025
- AlphaEvolve: A coding agent for scientific and algorithmic discovery2025
- AlphaGo Moment for Model Architecture Discovery2025
- Are Emergent Abilities in Large Language Models just In-Context Learning?2023
- AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning2025
- Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics2024
- Artifacts as Memory Beyond the Agent Boundary2026
- Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)2025
- Ask, and it shall be given: Turing completeness of prompting2024
- Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework2022
- Assessing adaptive world models in machines with novel games2025
- Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey2020
- AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders2025
- Base Models Know How to Reason, Thinking Models Learn When2025
- Behavioral Exploration: Learning to Explore via In-Context Adaptation2025
- Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases2025
- Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning2025
- Beyond Language Modeling: An Exploration of Multimodal Pretraining2026
- Beyond neural scaling laws: beating power law scaling via data pruning2022
- Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning2025
- Beyond Turing: Memory-Amortized Inference as a Foundation for Cognitive Computation2025
- Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need2025
- Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond2025
- Break It Down: Evidence for Structural Compositionality in Neural Networks2023
- Byte Latent Transformer: Patches Scale Better Than Tokens
- Can Language Models Serve as Text-Based World Simulators?2024
- Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?2024
- Chain-of-Thought Reasoning Without Prompting2024
- Circuit Tracing: Revealing Computational Graphs in Language Models
- CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization2023
- Cognitive Architectures for Language Agents2023
- ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning2025
- Compositional Reasoning with Transformers, RNNs, and Chain of Thought2025
- Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations2024
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models2026
- Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data2024
- Consistency Models Made Easy2024
- Consistency Training Helps Stop Sycophancy and Jailbreaks2025
- Context Tuning for Retrieval Augmented Generation2023
- Continual Instruction Tuning for Large Multimodal Models2023
- Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents2024
- Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations2023
- Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
- Critiques of World Models2025
- CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning2018
- Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents2025
- DataComp-LM: In search of the next generation of training sets for language models2024
- Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory2025
- Deep Interest Network for Click-Through Rate Prediction2017
- Deep Researcher with Test-Time Diffusion2025
- DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025
- DeepNet: Scaling Transformers to 1,000 Layers2022
- DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
- Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning2025
- Detecting hallucinations in large language models using semantic entropy
- Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time2025
- Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources2023
- Dialogue State Tracking with a Language Model using Schema-Driven Prompting
- Diffusion Language Models Know the Answer Before Decoding2025
- Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing2025
- Diffusion Models are Evolutionary Algorithms2024
- Diffusion-LM Improves Controllable Text Generation2022
- Discovering Latent Concepts Learned in BERT2022
- Do Language Models Understand Time?2024
- Do Large Language Models Latently Perform Multi-Hop Reasoning?2024
- Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?2024
- Do LLMs Encode Functional Importance of Reasoning Tokens?2026
- DocLLM: A layout-aware generative language model for multimodal document understanding2023
- Efficient Nearest Neighbor Language Models2021
- Efficient Reasoning with Hidden Thinking2025
- Efficient Streaming Language Models with Attention Sinks2023
- Eliciting Latent Knowledge from Quirky Language Models2023
- Eliciting Reasoning in Language Models with Cognitive Tools2025
- Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers2024
- Emergent Hierarchical Reasoning In LLMs Through Reinforcement Learning
- Emergent Introspective Awareness in Large Language Models
- Emerging Properties in Unified Multimodal Pretraining2025
- End-to-End Test-Time Training for Long Context2025
- Energy-Based Transformers are Scalable Learners and Thinkers2025
- Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models2025
- Equipping agents for the real world with Agent Skills
- Evaluating Very Long-Term Conversational Memory of LLM Agents2024
- Everything Everywhere All At Once: Llms Can In-context Learn Multiple Tasks In Superposition2024
- Evolving Deeper LLM Thinking2025
- Explainable Multimodal Emotion Reasoning2023
- Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering2026
- Extrapolation by Association: Length Generalization Transfer in Transformers2025
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs2026
- Fast, Slow, and Tool-augmented Thinking for LLMs: A Review2025
- FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning2025
- Foundation Priors2025
- Foundations of Large Language Models2025
- From Context to Skills: Can Language Models Learn from Context Skillfully?2026
- From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence2026
- From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks2024
- From Language to Logic: A Bi-Level Framework for Structured Reasoning2025
- From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models2024
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AI2026
- From Simulation to Enaction: Post-trained Language Models Recognize and React to their own Generations2026
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning2025
- From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR2025
- Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini2026
- Generalization through Memorization: Nearest Neighbor Language Models2019
- Generalization to New Sequential Decision Making Tasks with In-Context Learning2023
- Generative Agents: Interactive Simulacra of Human Behavior2023
- Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?2023
- Generative Recursive Reasoning2026
- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization2024
- Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning2023
- Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models2024
- Has the Creativity of Large-Language Models peaked? —an analysis of inter- and intra-LLM variability —2025
- Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence2026
- Hierarchical Reasoning Model2025
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches2025
- HiTKG: Towards Goal-Oriented Conversations via Multi-Hierarchy Learning
- Hogwild! Inference: Parallel LLM Generation via Concurrent Attention2025
- Holy Grail 2.0: From Natural Language to Constraint Models2023
- How do Transformers Learn Implicit Reasoning?2025
- How much do language models memorize?2025
- How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding2025
- How new data permeates LLM knowledge and how to dilute it2025
- Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks2024
- Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning
- Hyperagents2026
- Improving large language models with concept-aware fine-tuning2025
- In-context learning agents are asymmetric belief updaters2024
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model2023
- Insert-expansions For Tool-enabled Conversational Agents2023
- Inspecting and Editing Knowledge Representations in Language Models2023
- Investigating task-specific prompts and sparse autoencoders for activation monitoring2025
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens2025
- It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization2025
- Jamba: A Hybrid Transformer-Mamba Language Model2024
- KAN: Kolmogorov-Arnold Networks2024
- Language Modeling by Language Models2025
- Language Modeling is Compression2023
- Language Models are Pragmatic Speakers2023
- Language Models Need Sleep2026
- Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories2026
- Large Causal Models From Large Language Models2025
- Large Concept Models: Language Modeling in a Sentence Representation Space
- Large Language Diffusion Models2025
- Large Language Model Programs2023
- Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning2024
- Large language models can segment narrative events similarly to humans2023
- Large Language Models Reflect the Ideology of their Creators2024
- Large Language Models Report Subjective Experience Under Self-Referential Processing2025
- Large Multimodal Agents: A Survey2024
- Latent Collaboration in Multi-Agent Systems2025
- Latent Skill Discovery for Chain-of-Thought Reasoning2023
- LatentQA: Teaching LLMs to Decode Activations Into Natural Language2024
- Learning Agent-Compatible Context Management for Long-Horizon Tasks2026
- Learning Human-Object Interaction as Groups2025
- Learning to Relate to Previous Turns in Conversational Search2023
- Learning to Select the Relevant History Turns in Conversational Question Answering2023
- Learning, Fast and Slow: Towards LLMs That Adapt Continually2026
- Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models2024
- Levels of Analysis for Large Language Models2025
- Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity2022
- LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels2026
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models2024
- LIMI: Less is More for Agency2025
- LLaMA-Omni: Seamless Speech Interaction with Large Language Models2024
- LLM Post-Training: A Deep Dive into Reasoning Large Language Models2025
- LLM Reasoning Is Latent, Not the Chain of Thought2026
- LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders2024
- LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization2023
- LLMorphism: When humans come to see themselves as language models
- LLMs are Frequency Pattern Learners in Natural Language Inference2025
- LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!2025
- Localizing Paragraph Memorization in Language Models2024
- Looking beyond the next token2025
- Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers2026
- Looped Diffusion Language Models2026
- Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs2024
- LSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following2025
- Lumiere: A Space-Time Diffusion Model for Video Generation2024
- Making Sense of Memory in AI Agents
- MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving2025
- MasRouter: Learning to Route LLMs for Multi-Agent Systems2025
- Massive Activations in Large Language Models2024
- Mastering Diverse Domains through World Models2023
- MatFormer: Nested Transformer for Elastic Inference2023
- Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models2025
- Mechanisms of Introspective Awareness2026
- Mechanistic Indicators of Understanding in Large Language Models2025
- Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs2026
- Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads2024
- MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation2023
- Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models2025
- Memory in the Age of AI Agents: A Survey — Forms, Functions and Dynamics2025
- Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents2023
- Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving2024
- Mindstorms in Natural Language-Based Societies of Mind2023
- MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention2025
- Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning2025
- MIO: A Foundation Model on Multimodal Tokens2024
- Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say2025
- MLLM-CBench: A Comprehensive Benchmark for Continual Instruction Tuning of Multimodal LLMs with Chain-of-Thought Reasoning Analysis2025
- MM-LLMs: Recent Advances in MultiModal Large Language Models2024
- MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis2025
- Multi-Token Attention2025
- Multistep Consistency Models2024
- MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation2026
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention2025
- Natural Emergent Misalignment From Reward Hacking In Production Rl2025
- Navigating the Latent Space Dynamics of Neural Models2025
- Navigating the State of Cognitive Flow: Context-Aware AI Interventions for Effective Reasoning Support2025
- Nested Attention: Semantic-aware Attention Values for Concept Personalization2025
- Nested Learning: The Illusion of Deep Learning Architecture Expanded
- Nested Learning: The Illusion of Deep Learning Architectures
- Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning2019
- Neuro-Symbolic AI in 2024: A Systematic Review2025
- Neurosymbolic AI- Why, What, and How2023
- Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction2025
- No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance2024
- Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance2025
- NoveltyBench: Evaluating Language Models for Humanlike Diversity2025
- On the Binding Problem in Artificial Neural Networks2020
- On the Limits of Innate Planning in Large Language Models2025
- On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models
- On the Theoretical Limitations of Embedding-Based Retrieval2025
- Open Problems in Mechanistic Interpretability2025
- Orchestrating Synthetic Data with Reasoning
- Persistent Pre-Training Poisoning of LLMs2024
- Pixel-Level Reasoning Segmentation via Multi-turn Conversations2025
- Pixels, Patterns, but No Poetry: To See The World like Humans2025
- Position: Categorical Deep Learning is an Algebraic Theory of All Architectures2024
- Post-Completion Learning for Language Models2025
- Post-training makes large language models less human-like2026
- PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes2025
- Probing Structured Semantics Understanding and Generation of Language Models via Question Answering2024
- Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words2022
- Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models2024
- Progress Measures For Grokking Via Mechanistic Interpretability2023
- Pushdown Layers: Encoding Recursive Structure in Transformer Language Models2023
- QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration2025
- Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis2025
- Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models2026
- Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference2024
- Reasoning Language Models: A Blueprint2025
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025
- Reasoning to Learn from Latent Thoughts2025
- ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory2025
- ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering2025
- RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability2023
- Recursive Language Models2025
- Reinforced Attention Learning2026
- Reinforcement Learning Finetunes Small Subnetworks in Large Language Models2025
- Reinforcement Pre-Training2025
- Repeat After Me: Transformers are Better than State Space Models at Copying2024
- Representation biases: will we achieve complete understanding by analyzing representations?2025
- Representation Engineering: A Top-Down Approach to AI Transparency2023
- Rethinking Memory as Continuously Evolving Connectivity2026
- Rethinking Thinking Tokens: LLMs as Improvement Operators2025
- Retrieval Head Mechanistically Explains Long-Context Factuality2024
- Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up2024
- RL + Transformer = A General-Purpose Problem Solver2025
- Scalable Language Models with Posterior Inference of Latent Thought Vectors2025
- Scaling can lead to compositional generalization2025
- Scaling Latent Reasoning via Looped Language Models2025
- Scaling Laws for Neural Language Models2020
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach2025
- Schema-learning and rebinding as mechanisms of in-context learning and emergence2023
- See you soon again, chatbot? A design taxonomy to characterize user-chatbot relationships with different time horizons
- Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory2025
- Self-Discover: Large Language Models Self-Compose Reasoning Structures2024
- Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges2025
- Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?2025
- Self-reinforcing cascades: A spreading model for beliefs or products of varying intensity or quality2024
- Self-Rewarding Vision-Language Model via Reasoning Decomposition2025
- Semantic Structure in Large Language Model Embeddings2025
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training2025
- Simulating Society Requires Simulating Thought2025
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning2026
- Sleep-time Compute: Beyond Inference Scaling at Test-time2025
- Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space2025
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs2025
- Solving a Million-Step LLM Task with Zero Errors2025
- Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models2024
- SParC: Cross-Domain Semantic Parsing in Context
- SPICE: Self-Play In Corpus Environments Improves Reasoning2025
- SpikingBrain: Spiking Brain-inspired Large Models2025
- Style Vectors for Steering Generative Large Language Models
- Subliminal Learning: Language models transmit behavioral traits via hidden signals in data2025
- Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models2024
- System 1 vs. System 2 Thinking
- Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs2024
- TarGEN: Targeted Data Generation with Large Language Models2023
- Test-time Prompt Intervention2025
- Textgrad: Automatic “Differentiation” via Text2024
- The AI Hippocampus: How Far are We From Human Memory?2026
- The Consensus Game: Language Model Generation via Equilibrium Search2023
- The Curse Of Recursion: Training On Generated Data Makes Models Forget2023
- The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning2025
- The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?2025
- The Evolution of Multimodal Model Architectures2024
- The Future of AI: Exploring the Potential of Large Concept Models2025
- The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics2025
- The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning2026
- The Serial Scaling Hypothesis2025
- The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs2025
- The Unreasonable Ineffectiveness of the Deeper Layers2024
- The Vanishing Gradient Problem for Stiff Neural Differential Equations2025
- TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks2024
- There Will Be a Scientific Theory of Deep Learning2026
- Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens2026
- Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory2023
- Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor2026
- Thinking Augmented Pre-training2025
- Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs2025
- Thinking LLMs: General Instruction Following with Thought Generation2024
- Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction2025
- Thought Anchors: Which LLM Reasoning Steps Matter?2025
- Thought Communication in Multiagent Collaboration2025
- Titans: Learning to Memorize at Test Time2024
- TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters2024
- Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties2025
- Toward Conversational Agents with Context and Time Sensitive Long-term Memory2024
- Toward Efficient Agents: A Survey of Memory, Tool Learning, and Planning2026
- Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks2022
- Toward understanding and preventing misalignment generalization
- Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
- Towards Optimal Learning of Language Models2024
- Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control2024
- Towards Safe and Honest AI Agents with Neural Self-Other Overlap2024
- Training for Compositional Sensitivity Reduces Dense Retrieval Generalization2026
- Training language models to follow instructions with human feedback2022
- Training Large Language Models to Reason in a Continuous Latent Space2024
- Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis2024
- Transformer2: Self-adaptive LLMs2025
- TransformerFAM: Feedback attention is working memory2024
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality2024
- Turning large language models into cognitive models
- Understanding Hidden Computations in Chain-of-Thought Reasoning2024
- Understanding LLMs: A Comprehensive Overview from Training to Inference2024
- Unifying Large Language Models and Knowledge Graphs: A Roadmap2023
- Useful Memories Become Faulty When Continuously Updated by LLMs2026
- VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction2025
- Weight-sparse transformers have interpretable circuits2025
- What are the Goals of Distributional Semantics?2020
- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models2025
- Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning2025
Language, Text, and Discourse323↑ top
- (QA)2: Question Answering with Questionable Assumptions2022
- A comprehensive taxonomy of hallucinations in Large Language Models2025
- A Hybrid Human-AI Approach for Argument Map Creation From Transcripts
- A Hybrid Intelligence Method for Argument Mining2024
- A meta-analysis of the persuasive power of large language models
- A Non-Factoid Question-Answering Taxonomy
- A Probabilistic Model for Using Social Networks in Personalized Item Recommendation
- A recipe for annotating grounded clarifications2021
- A ripple in time: a discontinuity in American history2023
- A Robustness Evaluation Framework for Argument Mining
- A Survey on Lexical Ambiguity Detection and Word Sense Disambiguation2024
- A Survey on Prompt Tuning2025
- AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions2025
- ACE: Abstractions for Communicating Efficiently2024
- Adam's Law: Textual Frequency Law on Large Language Models2026
- Adapter-based Selective Knowledge Distillation for Federated Multi-domain Meeting Summarization2023
- Affordable AI Assistants with Knowledge Graph of Thoughts2025
- Agent Laboratory: Using LLM Agents as Research Assistants2025
- AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
- ANAPHORA RESOLUTION: THE STATE OF THE ART
- Are Customers Lying to Your Chatbot?
- Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks2023
- Argument Quality Assessment in the Age of Instruction-Following Large Language Models
- Argument Summarization and its Evaluation in the Era of Large Language Models2025
- Argumentative Large Language Models for Explainable and Contestable Decision-Making2024
- Argunauts: Open LLMs that Master Argument Analysis with Argdown2021
- Artificial intelligence is ineffective and potentially harmful for fact checking2023
- Aspect-oriented Opinion Alignment Network for Aspect-Based Sentiment Classification2023
- Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews2023
- Assessment of Personality Dimensions Across Situations Using Conversational Speech2025
- Atesa-bært: A Heterogeneous Ensemble Learning Model For Aspect-based Sentiment Analysis2023
- Attention on the brain
- Attention, Intentions, And The Structure Of Discourse
- Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models2025
- Automatic Extraction of Metaphoric Analogies from Literary Texts: Task Formulation, Dataset Construction, and Evaluation2024
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts2020
- Benchmarking the Pedagogical Knowledge of Large Language Models2025
- Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases2025
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey2024
- Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration2025
- Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate2025
- Beyond the Surface: Probing the Ideological Depth of Large Language Models2025
- Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
- Branch-Solve-Merge Improves Large Language Model Evaluation and Generation2023
- Can AI Explanations Make You Change Your Mind?2025
- Can Authorship Representation Learning Capture Stylistic Features?2023
- Can Language Models Recognize Convincing Arguments?2024
- Can Large Language Models Capture Human Annotator Disagreements?2025
- Can Large Language Models do Analytical Reasoning?2024
- Can Large Language Models perform Relation-based Argument Mining?2024
- Can Large Language Models Transform Computational Social Science?
- Can Large Language Models Understand Argument Schemes?
- Can Large Language Models Understand Context?2024
- Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation2024
- Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers2024
- Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions2025
- Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning2025
- CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning2025
- CEO: Corpus-based Open-Domain Event Ontology Induction2023
- Chain of Stance: Stance Detection with Large Language Models2024
- Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models2023
- Characterizing Online Discussion Using Coarse Discourse Sequences
- ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs2025
- ChatGPT: deconstructing the debate and moving it forward
- Choosing the Right Weights: Balancing Value, Strategy, and Noise in Recommender Systems2023
- Classifying YouTube Comments Based on Sentiment and Type of Sentence2021
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
- Clustering-based Sampling for Few-Shot Cross-Domain Keyphrase Extraction
- Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog2025
- ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning2025
- Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews2023
- Comparing emotion feature extraction approaches for predicting depression and anxiety
- Complex Logical Instruction Generation2025
- Computational Modelling of Undercuts in Real-world Arguments
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
- Constructing a Periodic Table of Arguments
- Context Embeddings for Efficient Answer Generation in RAG2024
- Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations2023
- Conversation Derailment Forecasting with Graph Convolutional Networks2023
- Conversational DNA: A New Visual Language for Understanding Dialogue Structure in Human and AI2025
- Conversational Semantic Parsing for Dialog State Tracking2020
- Conversations Gone Awry: Detecting Early Signs of Conversational Failure
- Creativity Has Left the Chat: The Price of Debiasing Language Models2024
- Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying2024
- Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models2024
- DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations2022
- Debating with More Persuasive LLMs Leads to More Truthful Answers2024
- DEEM: Dynamic Experienced Expert Modeling for Stance Detection2024
- DeepCT-enhanced Lexical Argument Retrieval
- Detecting Cognitive Distortions from Patient-Therapist Interactions
- Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change
- Detecting hallucinations in large language models using semantic entropy
- Determinants of LLM-assisted Decision-Making2024
- Detoxify Language Model Step-by-Step2023
- Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions
- DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications2024
- Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning2023
- Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
- Discourse-Level Representations can Improve Prediction of Degree of Anxiety
- Discovering Latent Concepts Learned in BERT2022
- Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations
- DiscussLLM: Teaching Large Language Models When to Speak2025
- Dissociating language and thought in large language models2023
- Do large language models resemble humans in language use?2023
- Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom2024
- Do LLMs produce texts with "human-like" lexical diversity?2025
- Do LLMs Truly Understand When a Precedent Is Overruled?2025
- Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?2025
- DR-HAI: Argumentation-based Dialectical Reconciliation in Human-AI Interactions2023
- Durably reducing conspiracy beliefs through dialogues with AI
- Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge2024
- Eliciting Reasoning in Language Models with Cognitive Tools2025
- Empirical Study of Symmetrical Reasoning in Conversational Chatbots2024
- Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System2023
- Evaluating Emotional Nuances In Dialogue Summarization2023
- Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling2023
- Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading2025
- EVINCE: Optimizing Multi-LLM Dialogues Using Conditional Statistics and Information Theory2024
- Explainable Compliance Detection with Multi-Hop Natural Language Inference on Assurance Case Structure2025
- Explicit Inductive Inference using Large Language Models2024
- Exploiting Dialogue Acts and Context to Identify Argumentative Relations in Online Debates
- Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models2023
- Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Approaches
- Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review2024
- Exploring the Potential of ChatGPT on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations
- Exploring the Potential of Large Language Models in Computational Argumentation2023
- Exploring the Role of Prior Beliefs for Argument Persuasion2019
- Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations2025
- Fake News Detectors are Biased against Texts Generated by Large Language Models2023
- Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences2025
- Fine-grained Hallucination Detection and Editing for Language Models2024
- Fine-tuning Pre-trained Language Models for Dialogical Argument Mining with Inference Anchoring Theory
- Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities2024
- Forecasting the presence and intensity of hostility on Instagram using linguistic and social features2018
- From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications2025
- From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers2025
- From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization2023
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation2025
- From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting2023
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning2025
- Further Explorations on the Use of Large Language Models for Thematic Analysis. Open-Ended Prompts, Better Terminologies and Thematic Maps
- GenAI as a Power Persuader: How Professionals Get Persuasion Bombed When They Attempt to Validate LLMs
- Generating Query-Relevant Document Summaries via Reinforcement Learning2025
- Grounding Gaps in Language Model Generations2023
- Grounding ‘Grounding’ in NLP2021
- Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence2026
- HonestBait: Forward References for Attractive but Faithful Headline Generation2023
- How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs2024
- How new data permeates LLM knowledge and how to dilute it2025
- How susceptible are LLMs to Logical Fallacies?2023
- How well can large language models explain business processes?2024
- HowProjective is Projective Content? Gradience in Projectivity and At-issueness
- Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks2024
- Identification of Propositional and Illocutionary Relations
- Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions2025
- Improving Document-Level Sentiment Analysis with User and Product Context2020
- Inducing Positive Perspectives with Text Reframing2022
- Inspecting and Editing Knowledge Representations in Language Models2023
- Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments2023
- Irony in Emojis: A Comparative Study of Human and LLM Interpretation2025
- Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?2024
- Language Agents as Optimizable Graphs2024
- Language models are weak learners2023
- Language models show human-like content effects on reasoning tasks2022
- Language Models’ Hall of Mirrors Problem: Why AI Alignment Requires Peircean Semiosis
- Large Concept Models: Language Modeling in a Sentence Representation Space
- Large Language Model-based Data Science Agent: A Survey2025
- Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments2024
- Large Language Models Can Infer Psychological Dispositions of Social Media Users2023
- Large language models can segment narrative events similarly to humans2023
- Large Language Models For Social Networks: Applications, Challenges, And Solutions
- Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions2023
- Large Linguistic Models: Investigating LLMs' metalinguistic abilities2023
- Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency2024
- Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments2025
- Learning to Map Context-Dependent Sentences to Executable Formal Queries2018
- Lexical Entrainment for Conversational Systems2023
- Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways2023
- Linguistic Alignment in Conversational AI: A Systematic Review of Cognitive-Linguistic Dimensions, Measurements, and User Outcomes (2020–2025)
- Linguistic Blind Spots of Large Language Models2025
- Linguistic markers of inherently false AI communication and intentionally false human communication: Evidence from hotel reviews
- LLM Augmentations to support Analytical Reasoning over Multiple Documents2024
- LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory2025
- LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback2024
- LLMs are Frequency Pattern Learners in Natural Language Inference2025
- LLMs as Architects and Critics for Multi-Source Opinion Summarization2025
- LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!2025
- LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High2025
- Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning2023
- Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models2024
- Machine gaze in online behavioral targeting: The effects of algorithmic human likeness on social presence and social influence
- Man vs machine – Detecting deception in online reviews
- MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization2025
- Meanings are like Onions: a Layered Approach to Metaphor Processing2025
- Measuring the Value of Social Dynamics in Online Product Ratings Forums
- Mechanistic Indicators of Understanding in Large Language Models2025
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications2025
- Metadiscursive nouns in academic argument: ChatGPT vs student practices
- Minds versus Machines: Rethinking Entailment Verification with Language Models2024
- Modeling Appropriate Language in Argumentation
- Modeling Interpersonal Linguistic Coordination in Conversations using Word Mover's Distance2019
- Modeling the Quality of Dialogical Explanations2024
- Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning2025
- Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes2023
- Neural Topic Modeling of Psychotherapy Sessions2022
- Neutralizing Bias in LLM Reasoning using Entailment Graphs2025
- News Sentiment Embeddings for Stock Price Forecasting2025
- News Source Citing Patterns in AI Search Systems2025
- No that's not what I meant: Handling Third Position Repair in Conversational Question Answering2023
- On the Adaptive Psychological Persuasion of Large Language Models2025
- On the Binding Problem in Artificial Neural Networks2020
- On the Conversational Basis of Some Presuppositions
- On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models
- Operating Multi-Client Influence Networks Across Platforms
- Opportunities for large language models and discourse in engineering design
- Overview of DialAM-2024: Argument Mining in Natural Language Dialogues
- PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing2026
- Persuasive presuppositions
- Position: LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks2024
- Post-training for Efficient Communication via Convention Formation2025
- Posting versus Lurking: Communicating in a Multiple Audience Context
- Pragmatic Implicature Processing in ChatGPT
- Premise Order Matters in Reasoning with Large Language Models2024
- Presuppositions are more persuasive than assertions if addressees accommodate them: Experimental evidence for philosophical reasoning
- Pretrained Language Models as Containers of the Discursive Knowledge
- PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes2025
- Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support2022
- Probing Structured Semantics Understanding and Generation of Language Models via Question Answering2024
- Prompting Large Language Models With the Socratic Method2023
- Propositional Interpretability in Artificial Intelligence2025
- Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience2024
- Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability2021
- Quantifying Controversy on Social Media2015
- Query Understanding in the Age of Large Language Models2023
- Real-time News Story Identification2025
- Reasoning Can Hurt the Inductive Abilities of Large Language Models2025
- Reasoning Models Are More Easily Gaslighted Than You Think2025
- Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?2025
- Recommendation systems and convergence of online reviews: The type of product network matters!
- Recommender Systems with Social Regularization
- ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs2023
- Representation Engineering: A Top-Down Approach to AI Transparency2023
- Reranking-based Generation for Unbiased Perspective Summarization2025
- Rethinking STS and NLI in Large Language Models
- Rhetoric, Logic, and Dialectic: Advancing Theory-based Argument Quality Assessment in Natural Language Processing
- Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design2025
- RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems2025
- RLNVR: Reinforcement Learning from Non-Verified Real-World Rewards2025
- SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval2023
- SciTopic: Enhancing Topic Discovery in Scientific Literature through Advanced LLM2025
- Self-critiquing models for assisting human evaluators2022
- Self-reflecting Large Language Models: A Hegelian Dialectical Approach2025
- Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?2025
- Self-reinforcing cascades: A spreading model for beliefs or products of varying intensity or quality2024
- Semantic Change Characterization with LLMs using Rhetorics2024
- Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
- Semantic Specialization for Knowledge-based Word Sense Disambiguation2023
- Semantic Structure in Large Language Model Embeddings2025
- Sequence Organization in Interaction: A Primer in Conversation Analysis
- Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making2025
- Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds2023
- Simulacra as conscious exotica2024
- SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding2023
- SocraSynth: Multi-LLM Reasoning with Conditional Statistics2024
- Sources of Hallucination by Large Language Models on Inference Tasks
- SPICE: Self-Play In Corpus Environments Improves Reasoning2025
- Stance Detection on Social Media with Fine-Tuned Large Language Models2024
- StoryScope: Investigating idiosyncrasies in AI fiction2026
- Strategic Reasoning with Language Models2023
- Summaries, Highlights, and Action items: Design, implementation and evaluation of an LLM-powered meeting recap system2023
- Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models2024
- Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs2024
- Task-Oriented Dialogue with In-Context Learning2024
- TaskLAMA: Probing the Complex Task Understanding of Language Models2023
- Teaching Probabilistic Logical Reasoning to Transformers2023
- The Alien Space of Science: Sampling Coherent but Cognitively Unavailable Research Directions2026
- The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants2017
- The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think2025
- The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning2025
- The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation2023
- The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs2022
- The Hermeneutics of Artificial Text
- The impact of generative artificial intelligence on socioeconomic inequalities and policy making
- The Levers of Political Persuasion with Conversational AI2025
- The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics2025
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning2026
- The persuasive effects of political microtargeting in the age of generative artificial intelligence
- The Place of Emotion in Argument
- The social component of the projection behavior of clausal complement contents
- The Thin Line Between Comprehension and Persuasion in LLMs2025
- The Vector Grounding Problem2023
- Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models2025
- Theory of Knowledge Based on the Idea of the Discursive Space
- Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate2025
- Thinking LLMs: General Instruction Following with Thought Generation2024
- Thought Anchors: Which LLM Reasoning Steps Matter?2025
- Toward Conversational Agents with Context and Time Sensitive Long-term Memory2024
- Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design2025
- Transformer-based cynical expression detection in a corpus of Spanish YouTube reviews
- Truth or lie: Exploring the language of deception
- Turiya at DialAM-2024: Inference Anchoring Theory Based LLM Parsers
- Turning large language models into cognitive models
- Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy2024
- Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting2025
- Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
- Using Computational Models to Test Syntactic Learnability
- Using LLMs to Discover Legal Factors2024
- Using Natural Language for Reward Shaping in Reinforcement Learning2019
- Using Topic Models to Identify Clients’ Functioning Levels and Alliance Ruptures in Psychotherapy
- Verbal lie detection using Large Language Models
- Virtuous Machines: Towards Artificial General Science2025
- Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics2025
- We’re Afraid Language Models Aren’t Modeling Ambiguity2023
- What are the Goals of Distributional Semantics?2020
- What does it mean to understand language?2025
- What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity2025
- What is a Discourse Graph?
- What we talk to when we talk to language models
- When Large Language Models are More Persuasive Than Incentivized Humans, and Why2025
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness2020
- Word Meanings in Transformer Language Models2025
- Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards2025
- “Understanding AI”: Semantic Grounding in Large Language Models2024
Recommender Systems107↑ top
- "It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems2021
- A Contextual-Bandit Approach to Personalized News Article Recommendation2010
- A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems2023
- A Multi-facet Paradigm to Bridge Large Language Model and Recommendation2023
- A Personalized Recommender System based-on Knowledge Graph Embeddings2023
- A Probabilistic Model for Using Social Networks in Personalized Item Recommendation
- A Survey on Large Language Models for Recommendation2023
- A Unified Multi-task Learning Framework for Multi-goal Conversational Recommender Systems2022
- Advances and Challenges in Conversational Recommender Systems: A Survey2021
- All AI Models are Wrong, but Some are Optimal2025
- Augmenting Netflix Search with In-Session Adapted Recommendations2022
- Backtracing: Retrieving the Cause of the Query2024
- Calibrated Recommendations
- Can AI Explanations Make You Change Your Mind?2025
- Capturing Individual Human Preferences with Reward Features2025
- Choosing the Right Weights: Balancing Value, Strategy, and Noise in Recommender Systems2023
- Collaborative Deep Learning for Recommender Systems2014
- Collaborative Filtering Bandits2015
- Collaborative Filtering for Implicit Feedback Datasets
- Collaborative Filtering with Temporal Dynamics
- CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation2023
- Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews2023
- Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommendations2025
- Content-aware Collaborative Music Recommendation Using Pre-trained Neural Networks
- Conversational Recommendation: A Grand AI Challenge2022
- Cumulated Gain-Based Evaluation of IR Techniques
- Curse of “Low” Dimensionality in Recommender Systems2023
- Deep Interest Network for Click-Through Rate Prediction2017
- Deep Neural Networks for YouTube Recommendations
- Dialoging Resonance: How Users Perceive, Reciprocate and React to Chatbot’s Self-Disclosure in Conversational Recommendations2021
- Dynamically Expandable Graph Convolution for Streaming Recommendation2023
- Embarrassingly Shallow Autoencoders for Sparse Data*2019
- Enabling Explainable Recommendation in E-commerce with LLM-powered Product Knowledge Graph2024
- Explainable Recommendation with Personalized Review Retrieval and Aspect Learning2023
- Explainable Recommendations via Attentive Multi-Persona Collaborative Filtering2020
- Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
- Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model
- Fast and Slow Learning From Reviews
- Generating Query-Relevant Document Summaries via Reinforcement Learning2025
- GenRec: Large Language Model for Generative Recommendation2023
- GHRS: Graph-based Hybrid Recommendation System with Application to Movie Recommendation2021
- Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations2023
- HyperBandit: Contextual Bandit with Hypernetwork for Time-Varying User Preferences in Streaming Recommendation2023
- I like it... I like it not: Evaluating User Ratings Noise in Recommender Systems
- Improving Conversational Recommender Systems via Transformer-based Sequential Modelling
- INSPIRED: Toward Sociable Recommendation Dialog Systems2020
- InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models2023
- KGAT: Knowledge Graph Attention Network for Recommendation2019
- Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models2025
- Large Language Models are Zero-Shot Rankers for Recommender Systems2023
- Large Language Models as Conversational Movie Recommenders: A User Study2024
- Large Language Models as Zero-Shot Conversational Recommenders2023
- Large Scale Product Graph Construction for Recommendation in E-commerce2020
- Learning Distributed Representations from Reviews for Collaborative Filtering2018
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries2025
- Learning to Ask Appropriate Questions in Conversational Recommendation2021
- Learning to Ask Critical Questions for Assisting Product Search2024
- Learning to Rank for Recommender Systems
- Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders2022
- Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System
- Leveraging Large Language Models in Conversational Recommender Systems2023
- LLM-Rec: Personalized Recommendation via Prompting Large Language Models2023
- LLMs as Architects and Critics for Multi-Source Opinion Summarization2025
- Measuring the Value of Social Dynamics in Online Product Ratings Forums
- Methodologies for Improving Modern Industrial Recommender Systems2023
- Monolith: Real Time Recommendation System With Collisionless Embedding Table2022
- Mostly Exploration-Free Algorithms for Contextual Bandits2017
- Multi-Task End-to-End Training Improves Conversational Recommendation2023
- Neural Collaborative Filtering2017
- Neural Collaborative Filtering vs. Matrix Factorization Revisited2020
- On Generative Agents in Recommendation2023
- On Information Distortions in Online Ratings
- OpinionConv: Conversational Product Search with Grounded Opinions2023
- Optimizing Encoder-Only Transformers for Session-Based Recommendation Systems2024
- Posting versus Lurking: Communicating in a Multiple Audience Context
- Preference Discerning with LLM-Enhanced Generative Retrieval2024
- Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis2024
- Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics2023
- Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning2025
- RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability2023
- Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)2022
- Recommendation systems and convergence of online reviews: The type of product network matters!
- Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
- Recommender Systems with Social Regularization
- Recommending What Video to Watch Next: A Multitask Ranking System
- Reconciling the accuracy-diversity trade-off in recommendations2023
- RevCore: Review-augmented Conversational Recommendation2021
- Review-LLM: Harnessing Large Language Models for Personalized Review Generation2024
- Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation2025
- Scalable Neural Contextual Bandit for Recommender Systems2023
- Self Selection and Information Role of Online Product Reviews
- Situating Recommender Systems in Practice: Towards Inductive Learning and Incremental Updates2022
- The Architectural Implications of Facebook’s DNN-based Personalized Recommendation2019
- The Netflix Recommender System: Algorithms, Business Value, and Innovation
- The persuasive effects of political microtargeting in the age of generative artificial intelligence
- Topic-Guided Conversational Recommender in Multiple Domains2020
- Towards Conversational Recommendation over Multi-Type Dialogs2020
- Towards Question-based Recommender Systems2020
- Tube2Vec: Social and Semantic Embeddings of YouTube Channels2023
- Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning2021
- Unifying Nearest Neighbors Collaborative Filtering
- User-Centric Conversational Recommendation with Multi-Aspect User Modeling2022
- Using Navigation to Improve Recommendations in Real-Time
- Variational Autoencoders for Collaborative Filtering2018
- Why Do People Rate? Theory and Evidence on Online Ratings
- Wide & Deep Learning for Recommender Systems2016
- “What do others think?”: Task-Oriented Conversational Modeling with Subjective Knowledge
Conversational AI and Personalization221↑ top
- A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges2025
- A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems2023
- A Little Human Data Goes A Long Way2024
- A Socially-Aware Conversational Recommender System for Personalized Recipe Recommendations
- A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects2023
- A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems2024
- Absolute Zero: Reinforced Self-play Reasoning with Zero Data2025
- Active Listening: Personalized Question Generation in Open-Domain Social Conversation with User Model Based Prompting
- Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics2025
- Adding Chit-Chat to Enhance Task-Oriented Dialogues2020
- Agent A/B: Automated and Scalable A/B Testing on Live Websites with Interactive LLM Agents2025
- Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models2025
- Agreement Tracking for Multi-Issue Negotiation Dialogues2023
- AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data2025
- ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making2025
- Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning2025
- Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models2019
- Are LLMs All You Need for Task-Oriented Dialogue?2023
- Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks2023
- ARGS: Alignment as Reward-Guided Search2024
- Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models2023
- Aspect-oriented Opinion Alignment Network for Aspect-Based Sentiment Classification2023
- Assessment of Personality Dimensions Across Situations Using Conversational Speech2025
- Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models2025
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework2023
- Backtracing: Retrieving the Cause of the Query2024
- Better Alignment with Instruction Back-and-Forth Translation2024
- Beyond Answers: How LLMs Can Pursue Strategic Thinking in Education2025
- Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration2025
- Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
- Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces.2023
- Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning2023
- CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society2023
- Can AI Agents Agree?2026
- Can AI Have a Personality? Prompt Engineering for AI Personality Simulation: A Chatbot Case Study in Gender-Affirming Voice Therapy Training2025
- Can LLM be a Personalized Judge?2024
- CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues2024
- Capturing Individual Human Preferences with Reward Features2025
- Challenges of Large Language Models for Mental Health Counseling2023
- Characterizing Online Discussion Using Coarse Discourse Sequences
- Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems2024
- Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness2024
- Cognitive Architectures for Language Agents2023
- CollabLLM: From Passive Responders to Active Collaborators2025
- Collaborative Reasoner: Self-Improving Social Agents with Synthetic Conversations
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning2025
- Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations2023
- Conversation Derailment Forecasting with Graph Convolutional Networks2023
- Conversational Alignment with Artificial Intelligence in Context2025
- Conversational DNA: A New Visual Language for Understanding Dialogue Structure in Human and AI2025
- Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
- Conversational Semantic Parsing for Dialog State Tracking2020
- Conversations Gone Awry: Detecting Early Signs of Conversational Failure
- CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks2025
- Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate2025
- DAPIE: Interactive Step-by-Step Explanatory Dialogues to Answer Children’s Why and How Questions
- Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models2024
- DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations2022
- Decision-Oriented Dialogue for Human–AI Collaboration2023
- Deep Neural Network Approach for the Dialog State Tracking Challenge
- DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025
- Dialog Inpainting: Turning Documents into Dialogs2022
- Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources2023
- Dialogue State Tracking with a Language Model using Schema-Driven Prompting
- Dialogue Transformers2019
- DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs2025
- DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications2024
- Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning2023
- Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations
- DiscussLLM: Teaching Large Language Models When to Speak2025
- Do Phone-Use Agents Respect Your Privacy?2026
- Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection2020
- Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search2024
- Dynamic Task-Oriented Dialogue: A Comparative Study of Llama-2 and Bert in Slot Value Generation
- Efficient Streaming Language Models with Attention Sinks2023
- Empirical Study of Symmetrical Reasoning in Conversational Chatbots2024
- Empowering Domain-Specific Language Models with Graph-Oriented Databases: A Paradigm Shift in Performance and Model Maintenance2024
- Enhancing Large Language Model Induced Task-Oriented Dialogue Systems Through Look-Forward Motivated Goals2023
- Enhancing personalized multi-turn dialogue with curiosity reward2025
- Enhancing Pipeline-Based Conversational Agents with Large Language Model2023
- Evaluating Emotional Nuances In Dialogue Summarization2023
- Exploring the Potential of Large Language Models in Computational Argumentation2023
- Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences2025
- Foundation Priors2025
- From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models2024
- From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?2025
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation2025
- From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs2024
- Generative Agents: Interactive Simulacra of Human Behavior2023
- Goal Alignment in LLM-Based User Simulators for Conversational AI2025
- GRASP: Municipal Budget AI Chatbots for Enhancing Civic Engagement2025
- Grounding Gaps in Language Model Generations2023
- H2HTalk: Evaluating Large Language Models as Emotional Companion2025
- Hello Again! LLM-powered Personalized Agent for Long-term Dialogue2024
- HiTKG: Towards Goal-Oriented Conversations via Multi-Hierarchy Learning
- IMBUE: Improving Interpersonal Effectiveness through Simulation and Just-in-time Feedback with Human-Language Model Interaction
- Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems2024
- Insert-expansions For Tool-enabled Conversational Agents2023
- IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems2025
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation2026
- Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues2023
- Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy2022
- Interaction Dynamics as a Reward Signal for LLMs2025
- Interactions with generative AI chatbots: unveiling dialogic dynamics, students’ perceptions, and practical competencies in creative problem-solving
- KETOD: Knowledge-Enriched Task-Oriented Dialogue2022
- Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations2023
- Language Model Personalization via Reward Factorization2025
- Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
- Learning "Partner-Aware" Collaborators in Multi-Party Collaboration2025
- Learning Retrieval Augmentation for Personalized Dialogue Generation2024
- Learning To Guide Human Experts Via Personalized Large Language Models2023
- Learning to Relate to Previous Turns in Conversational Search2023
- Learning to Select the Relevant History Turns in Conversational Question Answering2023
- Leveraging Large Language Models in Conversational Recommender Systems2023
- Lexical Entrainment for Conversational Systems2023
- Linguistic Alignment in Conversational AI: A Systematic Review of Cognitive-Linguistic Dimensions, Measurements, and User Outcomes (2020–2025)
- LLaMA-Omni: Seamless Speech Interaction with Large Language Models2024
- LLMs Get Lost In Multi-Turn Conversation2025
- Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing2024
- Making Sense of Memory in AI Agents
- MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation2023
- Memorization and Knowledge Injection in Gated LLMs2025
- Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents2023
- Modeling the Quality of Dialogical Explanations2024
- Multi-Task End-to-End Training Improves Conversational Recommendation2023
- MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs2025
- Neural Approaches to Conversational AI2018
- Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning2019
- Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes2023
- No that's not what I meant: Handling Third Position Repair in Conversational Question Answering2023
- Octopus v2: On-device language model for super agent2024
- On the Conversational Basis of Some Presuppositions
- OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
- OpinionConv: Conversational Product Search with Grounded Opinions2023
- Orchestrating Synthetic Data with Reasoning
- Overview of DialAM-2024: Argument Mining in Natural Language Dialogues
- PersLLM: A Personified Training Approach for Large Language Models2024
- PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time2025
- Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback2023
- Personalization of Large Language Models: A Survey2024
- Personalized Dialogue Generation with Persona-Adaptive Attention2022
- Personalized Language Modeling from Personalized Human Feedback2024
- Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning2024
- Planning Like Human: A Dual-process Framework for Dialogue Planning2024
- Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents2023
- PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking2019
- POMDP-based Statistical Spoken Dialogue Systems: a Review
- Post-training for Efficient Communication via Convention Formation2025
- Predictive Preference Learning from Human Interventions2025
- PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes2025
- Pro-Active Systems and Influenceable Users: Simulating Pro-Activity in Task-oriented Dialogues
- Proactive behavior in voice assistants: A systematic review and conceptual model
- Proactive Conversational Agents in the Post-ChatGPT World
- Proactive Conversational Agents with Inner Thoughts2024
- Proactive Human-Machine Conversation with Explicit Conversation Goals2019
- Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support2022
- Prompted LLMs as Chatbot Modules for Long Open-domain Conversation2023
- Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration2023
- ProsocialDialog: A Prosocial Backbone for Conversational Agents2022
- Provable Benefits of In-Tool Learning for Large Language Models2025
- Quantifying Human-AI Synergy
- Quantitative Introspection in Language Models: Tracking Internal States Across Conversation2026
- Real-Time Procedural Learning From Experience for AI Agents2025
- Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning2025
- Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations2024
- Reflections and New Directions for Human-Centered Large Language Models2026
- Reinforcement Learning for Optimizing RAG for Domain Chatbots2024
- Rethinking Conversational Agents in the Era of LLMs: Proactivity, Non-collaborativity, and Beyond
- Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation2025
- Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design2025
- RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models2023
- Scalable Language Models with Posterior Inference of Latent Thought Vectors2025
- Scaling Synthetic Data Creation with 1,000,000,000 Personas2024
- SDPO: Segment-Level Direct Preference Optimization for Social Agents2025
- SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs2025
- See you soon again, chatbot? A design taxonomy to characterize user-chatbot relationships with different time horizons
- Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
- Self-Directed Synthetic Dialogues and Revisions Technical Report2024
- Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels2024
- Self-Supervised Models of Speech Infer Universal Articulatory Kinematics2023
- Sequence Organization in Interaction: A Primer in Conversation Analysis
- Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making2025
- Simple Synthetic Data Reduces Sycophancy In Large Language Models2023
- SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding2023
- SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching
- Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations2026
- Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents2024
- Suppressing Pink Elephants with Direct Principle Feedback2024
- Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians2026
- Synthetic Dialogue Dataset Generation using LLM Agents2024
- Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager2025
- Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs2024
- Target-Guided Open-Domain Conversation2019
- Task-Oriented Dialogue as Dataflow Synthesis2020
- Task-Oriented Dialogue with In-Context Learning2024
- TaskLAMA: Probing the Complex Task Understanding of Language Models2023
- The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models2026
- The Challenges in Designing a Prevention Chatbot for Eating Disorders: Observational Study
- The Levers of Political Persuasion with Conversational AI2025
- Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering2023
- ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis2024
- Toward Conversational Agents with Context and Time Sensitive Long-term Memory2024
- Towards Conversational Recommendation over Multi-Type Dialogs2020
- Towards Human-centered Proactive Conversational Agents2024
- Towards Understanding Counseling Conversations: Domain Knowledge and Large Language Models2024
- Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression2025
- TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants2024
- Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion2024
- Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization2024
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models2024
- Understanding the Role of User Profile in the Personalization of Large Language Models2024
- Unintended Impacts of LLM Alignment on Global Representation2024
- Useful Memories Become Faulty When Continuously Updated by LLMs2026
- User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal2025
- Virtual Assistance in Any Context
- VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction2025
- Voxtral2025
- WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue2025
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness2020
- “Mama Always Had a Way of Explaining Things So I Could Understand”: A Dialogue Corpus for Learning to Construct Explanations2022
- “What do others think?”: Task-Oriented Conversational Modeling with Subjective Knowledge