TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation

Paper · arXiv 2402.10178 · Published February 15, 2024
Task Planning

The emergence of Large Language Models (LLMs) like ChatGPT has inspired the development of LLM-based agents capable of addressing complex, real-world tasks. However, these agents often struggle during task execution due to methodological constraints, such as error propagation and limited adaptability. To address this issue, we propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG). This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent, thereby enhancing adaptability in diverse and unpredictable real-world tasks. Simultaneously, existing benchmarks often lack the granularity needed to evaluate incremental progress in complex, multi-step tasks. In response, we introduce ItineraryBench in the context of travel planning, featuring interconnected, progressively complex tasks with a fine-grained evaluation system. ItineraryBench is designed to assess agents’ abilities in memory, planning, and tool usage across tasks of varying complexity. Our experimental results reveal that TDAG significantly outperforms established baselines, showcasing its superior adaptability and context awareness in complex task scenarios.

Introduction. The emergence of Large Language Models (LLMs) (Zhang et al., 2022; OpenAI, 2023), such as ChatGPT, represents a noteworthy milestone in artificial intelligence, laying the groundwork for the creation of LLM-based agents (Significant- Gravitas, 2023; Nakajima, 2023) with the capacity to automate tasks on behalf of humans. Despite advancements, LLM-based agents face substantial challenges in real-world tasks that de- mand complex planning, multi-stage reasoning, and tool utilization (Crispino et al., 2023; Qin et al., 2023). For instance, in a recent agent benchmark (Mialon et al., 2023), even GPT4 falls short with a success rate of 14%, while humans effortlessly exceed 92%. Empowering agents to effectively address real-world tasks is challenging, arising from difficulties in both methodological and benchmark perspectives. From a methodological perspective, existing approaches (Sun et al., 2023; Prasad et al., 2023; Wang et al., 2023b) aim to break down a complex request into a sequence of subtasks to reduce complexity and address the challenge of excessively long input.

Discussion / Conclusion. In this paper, we first present ItineraryBench, a benchmark for evaluating LLM-based agents in complex, real-world tasks, particularly in travel planning. It stands out with its interconnected, progressively challenging tasks and a nuanced evaluation system that goes beyond binary scoring. This approach allows for a more precise assessment of an agent’s capabilities, especially in partial task completions. Additionally, we introduce the TDAG framework, enhancing adaptability and success in diverse tasks by dynamically decomposing complex tasks into manageable subtasks, each handled by a custom-generated subagent. Our experimental results show that TDAG significantly outperforms existing baselines across several benchmarks.