TOPIC

NLP and Linguistics

36 synthesis notes · 137 source papers

View as

Why do speakers deliberately use ambiguous language?

Explores whether ambiguity is a linguistic defect or a strategic tool speakers use for efficiency, politeness, and deniability. Matters because it challenges how we train language systems.

Why do clarification requests look different at each communication level?

Explores whether clarifications are unified speech acts or distinct mechanisms grounded in different modalities. Matters because dialogue systems treat clarifications uniformly, missing most of them.

Why do speakers need to actively calibrate shared reference?

Explores whether using the same words guarantees speakers mean the same thing. Investigates how referential grounding differs across people and what collaborative work is needed to establish true understanding.

Are models actually reasoning about constraints or just defaulting conservatively?

Do language models genuinely apply constraints when solving problems, or do they simply prefer harder options by default? Minimal pair testing reveals whether apparent reasoning success masks hidden biases.

Do language models show the same content effects humans do?

Do LLMs reproduce human reasoning biases—like believing conclusions based on familiarity rather than logic—across different logical tasks? This matters because converging patterns across independent tasks suggest a fundamental architectural property rather than a task-specific quirk.

Do harder reasoning tasks trigger more semantic bias?

Does the difficulty of a logical task determine how much semantic content influences reasoning? This matters because it reveals whether we can isolate 'pure' logical reasoning in benchmarks.

Do language models fail reasoning tests that humans pass?

Standard critiques claim LLMs lack real reasoning ability, but do humans actually perform better on content-independent reasoning tasks? Examining whether the cognitive bar differs for artificial versus human intelligence.

Does language understanding happen only in the language system?

Explores whether the brain's core language system alone can produce genuine understanding, or whether deep comprehension requires dispatching information to perception, motor, and memory regions.

What formal languages actually help transformers learn natural language?

Not all formal languages are equally useful for pre-pretraining. This explores which formal languages transfer well to natural language and why—combining structural requirements with what transformers can actually learn.

Why do confident wrong answers hide in standard accuracy metrics?

When AI systems produce fluent but incorrect recommendations in high-stakes domains, standard accuracy evaluation may miss the failures entirely. What structural blind spot allows these errors to remain invisible?

Can language models learn meaning from text patterns alone?

Explores whether training on form alone—predicting the next word from prior words—could ever give language models access to communicative intent and genuine semantic understanding.

What makes linguistic agency impossible for language models?

From an enactive perspective, does linguistic agency require embodied participation and real stakes that LLMs fundamentally lack? This matters because it challenges whether LLMs can truly engage in language or only generate text.

What hidden assumptions drive how we build language models?

Large language models rest on two unstated assumptions about language and data. Understanding what engineers assume—and what enactive linguistics challenges—matters for knowing what LLMs actually can and cannot do.

Why does removing spurious cues sometimes hurt model performance?

Most models improve when spurious features are removed, but some fail worse. This note explores whether that failure represents a fundamentally different problem than traditional shortcut learning.

Why do language models fail to use knowledge they possess?

Large language models contain relevant world knowledge but often fail to activate it without explicit cues. This explores whether the bottleneck lies in knowledge storage or in the inference process that decides what background facts apply.

Can language models adapt implicature to conversational context?

Do large language models flexibly modulate scalar implicatures based on information structure, face-threatening situations, and explicit instructions—as humans do? This tests whether pragmatic computation is truly context-sensitive or merely literal.

Does semantic grounding in language models come in degrees?

Rather than asking whether LLMs truly understand meaning, this explores whether grounding is actually a multi-dimensional spectrum. The question matters because it reframes the sterile understand/don't-understand debate into measurable, distinct capacities.

Can LLMs acquire social grounding through linguistic integration?

Explores whether LLMs gradually develop social grounding as they become embedded in human language practices, analogous to child language acquisition. Tests whether grounding is a fixed property or an outcome of participatory use.

Should we call LLM errors hallucinations or fabrications?

Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.

Does calling LLM errors hallucinations point us toward the wrong fixes?

Explores whether the metaphor of 'hallucination' for LLM errors misdirects our efforts. The terminology we choose shapes which interventions we prioritize and how we conceptualize the underlying problem.

Can language models actually analyze language structure?

Explores whether LLMs can move beyond pattern matching to perform genuine metalinguistic analysis like syntactic tree construction and phonological reasoning, and what enables this capability.

Can large language models develop genuine world models without direct environmental contact?

Do LLMs extract meaningful world structures from human-generated text despite lacking direct sensory access to reality? This matters for understanding what kind of grounding and knowledge these systems actually possess.

Can language models recognize when text is deliberately ambiguous?

Explores whether LLMs can identify and handle multiple valid interpretations in a single phrase—a core human language skill that appears largely absent in current models despite their fluency on standard tasks.

Do language models learn abstract grammar or cultural speech patterns?

LLMs might learn more than grammar rules—they could be learning who says what to whom and when. This matters because it changes how we understand what biases and persona effects actually represent.

Can language models learn meaning without engaging the world?

Explores whether LLMs prove that meaning emerges from relational structure alone, independent of embodied experience or external reference. Tests structuralist theory empirically.

Do language models actually build shared understanding in conversation?

When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.

Why do language models fail at communicative optimization?

LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?

Do language models ignore goals when surface cues conflict?

When a task has an obvious surface cue that contradicts an unstated requirement, do LLMs follow the cue or the actual goal? This matters because it reveals whether reasoning failures come from missing knowledge or from how models weight competing signals.

Do standard NLP benchmarks hide LLM ambiguity failures?

When benchmark creators filter out ambiguous examples before testing, do they accidentally make it impossible to measure whether language models can actually handle ambiguity the way humans do?

Can formal language pretraining make language models more efficient?

Does training language models on hierarchical formal languages before natural language improve how efficiently they learn syntax? This explores whether structural inductive biases in training data matter more than raw data volume.

Does preference optimization damage conversational grounding in large language models?

Exploring whether RLHF and preference optimization actively reduce the communicative acts—clarifications, acknowledgments, confirmations—that build shared understanding in dialogue. This matters for high-stakes applications like medical and emotional support.

Source papers 137

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.

(QA)2: Question Answering with Questionable Assumptions
For instance, the question When did Marie Curie discover Uranium? cannot be answered as a typical when question without addressing the false assumption Marie Curie discovered Uranium. In this work, we…
A Non-Factoid Question-Answering Taxonomy
INSTRUCTION REASON EVIDENCE-BASED COMPARISON EXPERIENCE DEBATE INSTRUCTION You want to understand the procedure/method of doing/achieving something. Instructions/guidelines provided in a step-…
A comprehensive taxonomy of hallucinations in Large Language Models
This report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework that posits its inherent inevitability in computable LLMs, irrespect…
A meta-analysis of the persuasive power of large language models
Large language models (LLMs) are increasingly used for persuasion, such as in political communication and marketing, where they affect how people think, choose, and act. Yet, empirical findings on the…
A recipe for annotating grounded clarifications
In order to interpret the communicative intents of an utterance, it needs to be grounded in something that is outside of language; that is, grounded in world modalities. In this paper we argue that di…
ACE: Abstractions for Communicating Efficiently
A central but unresolved aspect of problem-solving in AI is the capability to introduce and use abstractions, something humans excel at. Work in cognitive science has demonstrated that humans tend tow…
AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
Distinguishing LLM-generated text from human-written is a key challenge for safe and ethical NLP, particularly in high-stake settings such as persuasive online discourse. While recent work focuses on …
ANAPHORA RESOLUTION: THE STATE OF THE ART
The paper is an introduction to anaphora resolution offering a brief survey of the major works in the field. Introduction. Anaphora resolution is a complicated problem in Natural Language Processing …
Adam's Law: Textual Frequency Law on Large Language Models
While textual frequency has been validated as relevant to human cognition in reading speed, its relatedness to Large Language Models (LLMs) is seldom studied. We propose a novel research direction in …
Argument Quality Assessment in the Age of Instruction-Following Large Language Models
Rather than just fine-tuning LLMs towards leaderboard chasing on assessment tasks, they need to be instructed systematically with argumentation theories and scenarios as well as with ways to solve arg…
Assessment of Personality Dimensions Across Situations Using Conversational Speech
Abstract—Prior research indicates that users prefer assistive technologies whose personalities align with their own. This has sparked interest in automatic personality perception (APP), which aims to …
Attention, Intentions, And The Structure Of Discourse
In this paper we explore a new theory of discourse structure that stresses the role of purpose and processing in discourse. In this theory, discourse structure is composed of three separate but interr…
Automatic Extraction of Metaphoric Analogies from Literary Texts: Task Formulation, Dataset Construction, and Evaluation
Extracting metaphors and analogies from free text requires high-level reasoning abilities such as abstraction and language understanding. Our study focuses on the extraction of the concepts that form …
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Pretraining language models on formal language can improve their acquisition of natural language. Which features of the formal language impart an inductive bias that leads to effective transfer? Drawi…
Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration
Critical thinking is essential for building robust AI systems, preventing them from blindly accepting flawed data or biased reasoning. However, prior work has primarily focused on passive critical thi…
Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
scaling has several downsides for both computational psycholinguistics and natural language processing research. We discuss the scientific challenges presented by the scaling paradigm, as well as the …
CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning
Large Language Models (LLMs) have recently achieved impressive results in complex reasoning tasks through Chain of Thought (CoT) prompting. However, most existing CoT methods rely on using the same pr…
Can AI Explanations Make You Change Your Mind?
In the context of AI-based decision support systems, explanations can help users to judge when to trust the AI’s suggestion, and when to question it. In this way, human oversight can prevent AI errors…
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autono…
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
Communication among humans relies on conversational grounding, allowing interlocutors to reach mutual understanding even when they do not have perfect knowledge and must resolve discrepancies in each …
Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation
Ambiguous words are often found in modern digital communications. Lexical ambiguity challenges traditional Word Sense Disambiguation (WSD) methods, due to limited data. Consequently, the efficiency of…
Can Large Language Models Understand Argument Schemes?
Argument schemes represent stereotypical patterns of reasoning that occur in everyday arguments. However, despite their usefulness, argument scheme classification — that is, classifying natural langua…
Can Large Language Models Understand Context?
Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the eval…
Can Large Language Models perform Relation-based Argument Mining?
The general AM problem can be split into three main tasks: 1) argument identification, involving segmenting text into units and determining which are argumentative; 2) identification of argumentative …
Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning
Chain-of-Thought (CoT) prompting plays an indispensable role in endowing large language models (LLMs) with complex reasoning capabilities. However, CoT currently faces two fundamental challenges: (1) …
Chain of Stance: Stance Detection with Large Language Models
Stance detection is an active task in natural language processing (NLP) that aims to identify the author’s stance towards a particular target within a text. Given the remarkable language understanding…
ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs
Background: Large Language Models (LLMs) like GPT-4 tailor their responses not just to the content but also to the tone of user prompts. Prior work has hinted that emotional phrasing – whether optimis…
ChatGPT: deconstructing the debate and moving it forward
Abstract Large language models such as ChatGPT enable users to automatically produce text but also raise ethical concerns, for example about authorship and deception. This paper analyses and discusses…
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
We argue that the language modeling task, because it only uses form as training data, cannot in principle lead to learning of meaning. We take the term language model to refer to any system trained on…
Clustering-based Sampling for Few-Shot Cross-Domain Keyphrase Extraction
Keyphrase extraction is the task of identifying a set of keyphrases present in a document that captures its most salient topics. Scientific domain-specific pre-training has led to achieving state-of-t…
Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog
In this paper, we introduce Collaborative Rational Speech Act (CRSA), an information-theoretic (IT) extension of RSA that models multi-turn dialog by optimizing a gain function adapted from rate-disto…
Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews
“Deciding on a product to purchase can be a time-consuming process. Every user has specific quality preferences, budget restrictions, or enjoys different item features. To distill important informatio…
Complex Logical Instruction Generation
Instruction following has catalyzed the recent era of Large Language Models (LLMs) and is the foundational skill underpinning more advanced capabilities such as reasoning and agentic behaviors. As tas…
Computational Modelling of Undercuts in Real-world Arguments
Argument Mining (AM) is the task of automatically analysing arguments, such that the unstructured information contained in them is converted into structured representations. Undercut is a unique struc…
Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
The discovery that “next-token predictor” language models can fluently produce text has important but underappreciated theoretical implications. Most notably, their success demonstrates that fully rel…
Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations
In the field of natural language processing, open-domain chatbots have emerged as an important research topic. However, a major limitation of existing open-domain chatbot research is its singular focu…
Conversational DNA: A New Visual Language for Understanding Dialogue Structure in Human and AI
What if the patterns hidden within dialogue reveal more about communication than the words themselves? We introduce Conversational DNA, a novel visual language that treats any dialogue – whether betwe…
Conversational Semantic Parsing for Dialog State Tracking
We consider a new perspective on dialog state tracking (DST), the task of estimating a user’s goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical repre…
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations
Those models take a contrastive learning approach, where they build binary classifiers to differentiate positive, or coherent examples from negative, or incoherent dialogues. Those classifiers are usu…
Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
Effective interlocutors account for the uncertain goals, beliefs, and emotions of others. But even the best human conversationalist cannot perfectly anticipate the trajectory of a dialogue. How well c…
Detecting Cognitive Distortions from Patient-Therapist Interactions
An important part of Cognitive Behavioral Therapy (CBT) is to recognize and restructure certain negative thinking patterns that are also known as cognitive distortions. This project aims to detect the…
Detecting hallucinations in large language models using semantic entropy
Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations—confabulations—which are arbitrary and incorrect genera…
Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions
Rating scales have shaped psychological research, but are resource-intensive and can burden participants. Large Language Models (LLMs) offer a tool to assess latent constructs in text. This study intr…
Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning
“We introduce a new benchmark, Diplomat, aiming at a unified paradigm for pragmatic reasoning and situated conversational understanding. Compared with previous works that treat different figurative ex…
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Abstract This paper describes the STAC resource, a corpus of multi-party chats annotated for discourse structure in the style of SDRT (Asher and Lascarides, 2003; Lascarides and Asher, 2009). The main…
Discovering Latent Concepts Learned in BERT
A large number of studies that analyze deep neural network models and their ability to encode various linguistic and non-linguistic concepts provide an interpretation of the inner mechanics of these m…
Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations
While large language models have significantly enhanced the effectiveness of discourse relation classifications, it remains unclear whether their comprehension is faithful and reliable. We provide DIS…
Do LLMs produce texts with "human-like" lexical diversity?
The degree to which LLMs produce writing that is truly human-like remains unclear despite the extensive empirical attention that this question has received. The present study addresses this question f…
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom
Understanding the non-literal meaning of an utterance is critical for large language models (LLMs) to become human-like social communicators. In this work, we introduce SwordsmanImp, the first Chinese…
Do large language models resemble humans in language use?
regularities in language range from phonology to pragmatics. For example, people associate different sounds with different referents (e.g., Köhler, 1929), automatically reinterpret implausible sentenc…
Eliciting Reasoning in Language Models with Cognitive Tools
The recent advent of reasoning models like OpenAI’s o1 was met with excited speculation by the AI community about the mechanisms underlying these capabilities in closed models, followed by a rush of r…
Empirical Study of Symmetrical Reasoning in Conversational Chatbots
Abstract. This work explores the capability of conversational chatbots powered by large language models (LLMs), to understand and characterize predicate symmetry, a cognitive linguistic function tradi…
Evaluating Emotional Nuances In Dialogue Summarization
Automatic dialogue summarization is a well-established task that aims to identify the most important content from human conversations to create a short textual summary. Despite recent progress in the …
Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading
In this study, we wish to showcase the unique utility of large language models (LLMs) in financial semantic annotation and alpha signal discovery. Leveraging a corpus of company-related tweets, we use…
Explicit Inductive Inference using Large Language Models
However, recently McKenna et al. (2023a) has pointed out that LLMs are severely affected by an attestation bias when performing inference tasks. Given the question of whether premise P entails hypothe…
Exploiting Dialogue Acts and Context to Identify Argumentative Relations in Online Debates
Argumentative Relation Classification is the task of determining the relationship between two contributions in the context of an argumentative dialogue. Existing models in the literature rely on a com…
Exploring the Potential of ChatGPT on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations
This paper aims to quantitatively evaluate the performance of ChatGPT, an interactive large language model, on inter-sentential relations such as temporal relations, causal relations, and discourse re…
Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences
Decision conferences are structured, collaborative meetings that bring together experts from various fields to address complex issues and reach a consensus on recommendations for future actions or pol…
Fine-tuning Pre-trained Language Models for Dialogical Argument Mining with Inference Anchoring Theory
In this paper, we present our framework for DialAM-2024 Task A: Identification of Propositional Relations and Task B: Identification of Illocutionary Relations. The goal of Task A is to detect argumen…
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
We prompted various LLMs with Big Five Personality Scale responses from 816 human individuals to role-play their responses on nine other psychological scales. LLMs demonstrated remarkable accuracy in …
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
Humans organize knowledge into compact categories through semantic compression by mapping diverse instances to abstract representations while preserving meaning (e.g., robin and blue jay are both bird…
Grounding Gaps in Language Model Generations
However, it is unclear whether large language models (LLMs) generate text that reflects human grounding. To this end, we curate a set of grounding acts and propose corresponding metrics that quantify …
Grounding ‘Grounding’ in NLP
In contrast, Cognitive Science more formally defines “grounding” as the process of establishing what mutual information is required for successful communication between two interlocutors – a definitio…
Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence
We propose a distributional theory of how hypernymy—the “is-a” relation between general and specific concepts—is encoded geometrically in language representations. Starting from the empirically verifi…
HowProjective is Projective Content? Gradience in Projectivity and At-issueness
Projective content is utterance content that a speaker may be taken to be committed to even when the expression associated with the content occurs embedded under an entailment-canceling operator (e.g.…
Identification of Propositional and Illocutionary Relations
In this paper we tackle the shared task DialAM- 2024 aiming to annotate dialogue based on the inference anchoring theory (IAT). The task can be split into two parts, identification of propositional re…
Inspecting and Editing Knowledge Representations in Language Models
Neural language models (LMs) represent facts about the world described by text. Sometimes these facts derive from training data (in most LMs, a representation of the word banana encodes the fact that …
Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments
![A diagram of a religious structure](/assets/paper-images/InterpretationModeling.png) ![A chart with text on it](/assets/paper-images/InterpretationModeling2.png) The social and implicit nature of h…
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
However, human sarcasm understanding is often considered an intuitive and holistic cognitive process, in which various linguistic, contextual, and emotional cues are integrated to form a comprehensive…
LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High
These implicit assumptions, known as presuppositions, refer to background knowledge or shared beliefs assumed to be part of the common ground between interlocutors (Stalnaker, 1973). Presuppositions a…
LLMs are Frequency Pattern Learners in Natural Language Inference
While fine-tuning LLMs on NLI corpora improves their inferential performance, the underlying mechanisms driving this improvement remain largely opaque. In this work, we conduct a series of experiments…
Language Models’ Hall of Mirrors Problem: Why AI Alignment Requires Peircean Semiosis
This paper examines some limitations of large language models (LLMs) through the framework of Peircean semiotics. We argue that basic LLMs exist within a "hall of mirrors," manipulating symbols withou…
Language models show human-like content effects on reasoning tasks
Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human …
Large Linguistic Models: Investigating LLMs' metalinguistic abilities
Abstract—The performance of large language models (LLMs) has recently improved to the point where models can perform well on many language tasks. We show here that—for the first time—the models can al…
Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
Languaging is not the kind of thing that can admit of a complete or comprehensive modelling. From an enactive perspective we identify three key characteristics of enacted language; embodiment, partici…
Learning to Map Context-Dependent Sentences to Executable Formal Queries
We propose a context-dependent model to map utterances within an interaction to executable formal queries. To incorporate interaction history, the model maintains an interaction-level encoder that upd…
Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways
We present Lil-Bevo, our submission to the BabyLM Challenge. We pretrained our masked language models with three ingredients: an initial pretraining with music data, training on shorter sequences befo…
Linguistic Alignment in Conversational AI: A Systematic Review of Cognitive-Linguistic Dimensions, Measurements, and User Outcomes (2020–2025)
Conversational Artificial Intelligence systems frequently adapt to or mirror the user’s linguistic style, an emergent dynamic that shapes whether the AI is perceived as a tool, a partner, or a hybrid …
Linguistic Blind Spots of Large Language Models
Large language models (LLMs) are the foundation of many AI applications today. However, despite their remarkable proficiency in generating coherent text, questions linger regarding their ability to pe…
Linguistic markers of inherently false AI communication and intentionally false human communication: Evidence from hotel reviews
To the human eye, AI-generated outputs of large language models have increasingly become indistinguishable from human-generated outputs. Therefore, to determine the linguistic properties that separate…
Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models
In the recent past, a popular way of evaluating natural language understanding (NLU), was to consider a model’s ability to perform natural language inference (NLI) tasks. In this paper, we investigate…
Man vs machine – Detecting deception in online reviews
This study focused on three main research objectives: analyzing the methods used to identify deceptive online consumer reviews, evaluating insights provided by multi-method automated approaches based …
Meanings are like Onions: a Layered Approach to Metaphor Processing
Abstract Metaphorical meaning is not a flat mapping between concepts, but a complex cognitive phenomenon that integrates multiple levels of interpretation. In this paper, we propose a stratified mode…
Metadiscursive nouns in academic argument: ChatGPT vs student practices
The ability of ChatGPT to create grammatically accurate and coherent texts has generated considerable anxiety among those concerned that students might use such large language models (LLMs) to write t…
Minds versus Machines: Rethinking Entailment Verification with Language Models
Leveraging a comprehensively curated entailment verification benchmark, we evaluate both human and LLM performance across various reasoning categories. Our benchmark includes datasets from three categ…
Modeling Interpersonal Linguistic Coordination in Conversations using Word Mover's Distance
Linguistic coordination is a well-established phenomenon in spoken conversations and often associated with positive social behaviors and outcomes. While there have been many attempts to measure lexica…
Modeling the Quality of Dialogical Explanations
Abstract Explanations are pervasive in our lives. Mostly, they occur in dialogical form where an explainer discusses a concept or phenomenon of interest with an explainee. Leaving the explainee with a…
Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes
“In this paper, we attempt to systematise the literature about the attested problems of neural conversation models (conditional language models realised with neural networks) used as chat-partner simu…
Neutralizing Bias in LLM Reasoning using Entailment Graphs
However, recent works show that LLMs still suffer from hallucinations in NLI due to attestation bias, where LLMs overly rely on propositional memory to build shortcuts. To solve the issue, we design a…
No that's not what I meant: Handling Third Position Repair in Conversational Question Answering
The ability to handle miscommunication is crucial to robust and faithful conversational AI. People usually deal with miscommunication immediately as they detect it, using highly systematic interaction…
On the Binding Problem in Artificial Neural Networks
In this work, we argue that this underlying cause is the binding problem: The inability of existing neural networks to dynamically and flexibly bind information that is distributed throughout the netw…
On the Conversational Basis of Some Presuppositions
The current literature on presupposition focuses almost exclusively on the projection problem: the question of how and why the presuppositions of atomic clauses are projected to complex sentences whic…
On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models
The ability of Large Language Models (LLMs) to encode syntactic and semantic structures of language is well examined in NLP. Additionally, analogy identification, in the form of word analogies are ext…
Opportunities for large language models and discourse in engineering design
In this paper, we argue that foundation models such as LLMs can be used for creative reasoning tasks in the engineering design process, complementing and integrating existing computational methods suc…
Overview of DialAM-2024: Argument Mining in Natural Language Dialogues
Argumentation is the process by which humans rationally elaborate their thoughts and opinions in written (e.g., essays) or spoken (e.g., debates) contexts. Argument Mining research, however, has been …
Position: LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Large Language Models (LLMs), essentially n-gram models on steroids which have been pre-trained on web-scale language corpora (or, effectively, our collective consciousness), have caught the imaginati…
Post-training for Efficient Communication via Convention Formation
Humans communicate with increasing efficiency in multi-turn interactions, by adapting their language and forming ad-hoc conventions. In contrast, prior work shows that LLMs do not naturally show this …
Pragmatic Implicature Processing in ChatGPT
Recent large language models (LLMs) and LLM-driven chatbots, such as ChatGPT, have sparked debate regarding whether these artificial systems can develop human-like linguistic capacities. We examined t…
Presuppositions are more persuasive than assertions if addressees accommodate them: Experimental evidence for philosophical reasoning
Best practice and descriptive research claim that presuppositions, such as the “too” in “,” increase the persuasiveness of arguments. Surprisingly, there is hardly any causal evidence for this claim. …
Pretrained Language Models as Containers of the Discursive Knowledge
Abstract: Discourses can be treated as instances of knowledge. The dynamic space in which the trajectories of these discourses are described can be regarded as a model of knowledge. Such a space is ca…
Probing Structured Semantics Understanding and Generation of Language Models via Question Answering
As John McCarthy (McCarthy, 1990, 1959) points out, in order to a better understanding of natural language, it is necessary for an intelligence system to understand the “deep structure” (Chomsky, 2011…
Real-time News Story Identification
To improve the reading experience, many news sites organize news into topical collections, called stories. In this work, we present an approach for implementing real-time story identification for a ne…
Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design
Modern AI systems are notoriously opaque, limiting efforts to understand or audit their behaviors [42, 188]. In response, Explainable Artificial Intelligence (XAI) aims to foster trust and accountabil…
SciTopic: Enhancing Topic Discovery in Scientific Literature through Advanced LLM
Abstract—Topic discovery in scientific literature provides valuable insights for researchers to identify emerging trends and explore new avenues for investigation, facilitating easier scientific infor…
Semantic Change Characterization with LLMs using Rhetorics
Languages continually evolve in response to societal events, resulting in new terms and shifts in meanings. These changes have significant implications for computer applications, including automatic t…
Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
![A diagram of a event](/assets/paper-images/SemanticParsingForTaskOrientedDialog.png) Task oriented dialog systems typically first parse user utterances to semantic frames comprised of intents and s…
Semantic Structure in Large Language Model Embeddings
Psychological research consistently finds that human ratings of words across diverse semantic scales can be reduced to a low-dimensional form with relatively little information loss. We find that the …
Sequence Organization in Interaction: A Primer in Conversation Analysis
Much of our daily lives is spent talking to one another, in both ordinary conversation and more specialized settings such as meetings, interviews, classrooms, and courtrooms. It is largely through con…
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
We evaluate LLMs’ language understanding capacities on simple inference tasks that most humans find trivial. Specifically, we target (i) grammatically-specified entailments, (ii) premises with evident…
Simulacra as conscious exotica
The advent of conversational agents with increasingly human-like behaviour throws old philosophical questions into new light. Does it, or could it, ever make sense to speak of AI agents built out of g…
Sources of Hallucination by Large Language Models on Inference Tasks
We establish two biases originating from pretraining which predict much of their behavior, and show that these are major sources of hallucination in generative LLMs. First, memorization at the level o…
StoryScope: Investigating idiosyncrasies in AI fiction
As AI-generated fiction becomes increasingly prevalent, questions of authorship and originality are becoming central to how written work is evaluated. While most existing work in this space focuses on…
Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs
Humans spontaneously use increasingly efficient language as interactions progress, by adapting and forming ad-hoc conventions. This phenomenon has been studied extensively using reference games, showi…
Task-Oriented Dialogue with In-Context Learning
We describe a system for building task oriented dialogue systems combining the in context learning abilities of large language models (LLMs) with the deterministic execution of business logic. LLMs ar…
TaskLAMA: Probing the Complex Task Understanding of Language Models
“Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute…
Teaching Probabilistic Logical Reasoning to Transformers
We propose a novel end-to-end fine-tuning approach, Probabilistic Constraint Training (PCT), that utilizes probabilistic logical rules as constraints in the fine-tuning phase without relying on these …
The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
Reasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly …
The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning
Abstract—Context recognition (SR) is a fundamental task in computer vision that aims to extract structured semantic summaries from images by identifying key events and their associated entities. Speci…
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Despite widespread use of LLMs as conversational agents, evaluations of performance fail to capture a crucial aspect of communication: interpreting language in context—incorporating its pragmatics. Hu…
The Hermeneutics of Artificial Text
The paper justifies the necessity of using the research background of hermeneutics to study artificial texts and also proposes the first conclusions about these texts in the context of this background…
The Levers of Political Persuasion with Conversational AI
There are widespread fears that conversational AI could soon exert unprecedented influence over human beliefs. Here, in three large-scale experiments (N=76,977), we deployed 19 LLMs—including some pos…
The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose–measure–bridge–treat framework. Causal-behavior…
The Vector Grounding Problem
Confusingly, the notion of grounding is also used in relation to another aspect of language, which has to do with communication (Clark & Brennan 1991, Traum 1994). In this context, the ’grounding prob…
The social component of the projection behavior of clausal complement contents
Abstract. Some accounts of presupposition projection predict that content’s consistency with the Common Ground influences whether it projects (e.g., Heim 1983; Gazdar 1979a,b). I conducted an experime…
Theory of Knowledge Based on the Idea of the Discursive Space
This paper discusses the theory of knowledge based on the idea of dynamical space. The goal of this effort is to comprehend the knowledge that remains beyond the human domain, e.g., of the artificial …
Toward Conversational Agents with Context and Time Sensitive Long-term Memory
There has recently been growing interest in conversational agents with long-term memory which has led to the rapid development of language models that use retrieval-augmented generation (RAG). Until r…
Transformer-based cynical expression detection in a corpus of Spanish YouTube reviews
Consumers of services and products exhibit a wide range of behaviors on social networks when they are dissatisfied. In this paper, we consider three types of cynical expressions – negative feelings, s…
Truth or lie: Exploring the language of deception
Lying appears in everyday oral and written communication. As a consequence, detecting it on the basis of linguistic analysis is particularly important. Our study aimed to verify whether the difference…
Turning large language models into cognitive models
ask whether large language models can be turned into cognitive models. We find that – after finetuning them on data from psychological experiments – these models offer accurate representations of huma…
Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy
The widespread use of social media has led to a surge in popularity for automated methods of analyzing public opinion. Supervised methods are adept at text categorization, yet the dynamic nature of so…
Using Natural Language for Reward Shaping in Reinforcement Learning
Using arbitrary natural language statements within reinforcement learning presents several challenges. First, a mapping between language and objects/actions must implicitly or explicitly be learned, a…
We’re Afraid Language Models Aren’t Modeling Ambiguity
Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our inte…
What are the Goals of Distributional Semantics?
Distributional semantic models have become a mainstay in NLP, providing useful features for downstream tasks. However, assessing long-term progress requires explicit long-term goals. In this paper, I …
What does it mean to understand language?
Language understanding entails not just extracting the surface-level meaning of the linguistic input, but constructing rich mental models of the situation it describes. Here we propose that because pr…
What we talk to when we talk to language models
David Chalmers [[Linguistics, NLP, NLU]] [[Role Play]] [[Philosophy Subjectivity]] Quasi-interpretivism does not say anything about whether LLMs have beliefs and desires. But it does make it plausib…
Word Meanings in Transformer Language Models
We investigate how word meanings are represented in the transformer language models. Specifically, we focus on whether transformer models employ something analogous to a lexical store - where each wor…
“Understanding AI”: Semantic Grounding in Large Language Models
This motivates another method: looking under the hood of systems and exploring their internal mechanisms and functions. But in the case of deep learning neural networks, the notorious black box proble…