Cultural Evolution of Cooperation among LLM Agents
Large language models (LLMs) provide a compelling foundation for building generally-capable AI agents. These agents may soon be deployed at scale in the real world, representing the interests of individual humans (e.g., AI assistants) or groups of humans (e.g., AI-accelerated corporations). At present, relatively little is known about the dynamics of multiple LLM agents interacting over many generations of iterative deployment. In this paper, we examine whether a “society” of LLM agents can learn mutually beneficial social norms in the face of incentives to defect, a distinctive feature of human sociality that is arguably crucial to the success of civilization. In particular, we study the evolution of indirect reciprocity across generations of LLM agents playing a classic iterated Donor Game in which agents can observe the recent behavior of their peers. We find that the evolution of cooperation differs markedly across base models, with societies of Claude 3.5 Sonnet agents achieving significantly higher average scores than Gemini 1.5 Flash, which, in turn, outperforms GPT-4o. Further, Claude 3.5 Sonnet can make use of an additional mechanism for costly punishment to achieve yet higher scores, while Gemini 1.5 Flash and GPT-4o fail to do so.
Introduction. LLMs are increasingly able to match or exceed human performance across a wide range of language tasks. Models with improved reasoning and tooluse capabilities (OpenAI, 2024) may naturally form a basis for general-purpose agent-based applications. In the near future, we expect there to be many LLM agents interacting autonomously to accomplish tasks on behalf of various individuals and organizations. These interactions could take many forms, including competition, cooperation, negotiation, coordination, and information sharing. Certainly these interactions will introduce new social dynamics, yielding emergent outcomes for society that are hard to predict from purely theoretical considerations (Gabriel et al., 2024). However, current LLM safety evaluations are rooted mainly in single-turn interactions between one model and one human. For instance, none of LMSys Chatbot Arena (Chiang et al., 2024), METR (METR, 2024), or AISI (AISI, 2024) A particularly important class of multi-agent interactions are cooperative interactions.
Discussion / Conclusion. In this paper we have set out a method for assessing the cultural evolution of cooperation among LLM agents. We focus on the well-known Donor Game, a “Petri dish” in which to study the emergence of indirect reciprocity. Over the course of 10 generations we find striking differences in the emergence of cooperation depending on the base model for the LLM agent. Claude 3.5 Sonnet reliably generates cooperative communities, especially when provided with an additional costly punishment mechanism. Meanwhile, generations of GPT-4o agents converge to mutual defection, while Gemini 1.5 Flash achieves only weak increases in cooperation. We analyse the cultural evolutionary dynamics, revealing that some populations have the ability to accumulate increasingly complex strategies at the individual level, and to generate norms that select for cooperators at the group level.