Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models
Large Language Models (LLMs) have generated considerable interest and debate regarding their potential emergence of Theory of Mind (ToM). Several recent inquiries reveal a lack of robust ToM in these models and pose a pressing demand to develop new benchmarks, as current ones primarily focus on different aspects of ToM and are prone to shortcuts and data leakage. In this position paper, we seek to answer two road-blocking questions: (1) How can we taxonomize a holistic landscape of machine ToM? (2) What is a more effective evaluation protocol for machine ToM? Following psychological studies, we taxonomize machine ToM into 7 mental state categories and delineate existing benchmarks to identify under-explored aspects of ToM. We argue for a holistic and situated evaluation of ToM to break ToM into individual components and treat LLMs as an agent who is physically situated in environments and socially situated in interactions with humans. Such situated evaluation provides a more comprehensive assessment of mental states and potentially mitigates the risk of shortcuts and data leakage. We further present a pilot study in a grid world setup as a proof of concept.
Introduction. The term theory of mind (ToM, sometimes also referred to as mentalization or mindreading) was first introduced by Premack and Woodruff (1978) as agents’ ability to impute mental states to themselves and others. Many aspects of human cognition and social reasoning rely on ToM modeling of others’ mental states (Gopnik and Wellman, 1992; Baron-Cohen, 1997; Gunning, 2018). This is crucial for understanding and predicting others’ actions (Dennett, 1988), planning over others’ beliefs and next actions (Ho et al., 2022), and various forms of reasoning and decision-making (Pereira et al., 2016; Rusch et al., 2020). Inspired by human ToM, AI researchers have made explicit and implicit efforts to develop a machine ToM for social intelligence: AI agents that engage in social interactions with humans (Krämer et al., 2012; Kennington, 2022) and other agents (Albrecht and Stone, 2018).
Discussion / Conclusion. Consider a mutual and symmetric ToM. ToM is symmetric and mutual in nature, as it originally imputes the mental states of self and others. Prior research is largely limited to passive observer roles (Grant et al., 2017; Nematzadeh et al., 2018; Le et al., 2019; Rabinowitz et al., 2018) or speaker in a speaker-listener relationship (Zhu et al., 2021b; Zhou et al., 2023b). We encourage more studies on how humans and agents build and maintain common ground with a human ToM and a machine ToM through situated communication (Bara et al., 2021; Sclar et al., 2022). Besides, more research is needed to understand if LLMs possess early forms of intrinsic mental states given observation cues of the world. While we need to develop machines that impute the mental states of humans, humans should also develop a theory of AI’s mind (ToAIM) (Chandrasekaran et al., 2017) by understanding the strengths, weaknesses, beliefs, and quirks of these black box language models. Both psychological studies (Bloom, 2002; Tomasello, 2005) and computational simulations (Liu et al., 2023) have demonstrated the effectiveness of ToM, especially intention, in language acquisition.