Cognitive Architectures for Language Agents

Paper · arXiv 2309.02427 · Published September 5, 2023

Recent efforts have incorporated large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning. However, these efforts have largely been piecemeal, lacking a systematic framework for constructing a fully-fledged language agent. To address this challenge, we draw on the rich history of agent design in symbolic artificial intelligence to develop a blueprint for a new wave of cognitive language agents. We first show that LLMs have many of the same properties as production systems, and recent efforts to improve their grounding or reasoning mirror the development of cognitive architectures built around production systems. We then propose Cognitive Architectures for Language Agents (CoALA), a conceptual framework to systematize diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework. Finally, we use the CoALA framework to highlight gaps and propose actionable directions toward more capable language agents in the future.

Introduction. Despite revolutionizing natural language processing (NLP), large language models (LLMs; Vaswani et al., 2017; Brown et al., 2020; Devlin et al., 2019; Brown et al., 2020; OpenAI, 2023) have limited world knowledge and are not grounded to external environments. To address these shortcomings, recent methods have augmented (Mialon et al., 2023) LLMs with external resources such as memory stores (Guu et al., 2020) or pre-defined sequences of prompts to structure reasoning (Creswell et al., 2023; Wu et al., 2022b). Parallel work has grounded LLMs by placing them in a feedback loop with the environment, leading to an emerging field of language agents: interactive systems that use LLMs for sequential decision-making (Yang et al., 2023b; Wang et al., 2023b). While the earliest agents used the LLM to directly select actions (Ahn et al., 2022; Huang et al., 2022b), the latest generation uses a series of LLM calls to reason (Yao et al., 2022b) or read and write from internal memory (Park et al., 2023) to improve decision making.

Discussion / Conclusion. Planning vs. execution: how much should agents plan? Making a call to an LLM is both slow and computationally intensive. Relying on LLMs for decision-making thus requires balancing the cost of planning against the utility of the resulting improved plan. This has analogs to the “value of computation” studied in human metareasoning (Lieder and Griffiths, 2020; Callaway et al., 2022; Gershman et al., 2015). The present work fixes a search budget by specifying a depth of reasoning (Yao et al., 2023), but studies of humans suggest that they allocate computation adaptively (Russek et al., 2022). Future work should develop mechanisms to estimate the utility of planning and modify the decision procedure accordingly, i.e., learning update decision making procedures in the CoALA framework. Learning vs. acting: how should agents continuously and autonomously learn? In the CoALA framework, learning is a result action of a decision making cycle just like grounding: the agent deliberately chooses to commit information to long-term memory.

Cognitive Architectures for Language Agents

Synthesis notes that discuss concepts related to this paper