MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
The basic question-answering format of large language models involves inputting a prompt and receiving a response, and the quality of the prompt directly impacts the effectiveness of the response. Automated Prompt Optimization (APO) aims to break free from the cognitive biases of manually designed prompts and explores a broader design space for prompts. However, existing APO methods suffer from limited flexibility of fixed templates and inefficient search in prompt spaces as key issues. To this end, we propose a Multi-Agent framework IncorpoRating Socratic guidance (MARS)1, which utilizes multi-agent fusion technology for automatic planning, with gradual continuous optimization and evaluation. Specifically, MARS comprises seven agents, each with distinct functionalities, which autonomously use the Planner to devise an optimization path that ensures flexibility. Additionally, it employs a Teacher-Critic-Student Socratic dialogue pattern to iteratively optimize the prompts while conducting effective search. We conduct extensive experiments on various datasets to validate the effectiveness of our method, and perform additional analytical experiments to assess the model’s advancement as well as the interpretability.
Introduction. Large language models (LLMs) such as GPT- 4 (Achiam et al., 2023) and Deepseek-R1 (Guo et al., 2025) provide robust support for thousands of natural language processing tasks. By providing a natural language prompt that includes instructions and a task description, LLMs can quickly adapt and respond (Lin et al., 2025). Consequently, the quality of the prompt is of critical importance, leading to wide interest in Automated Prompt Optimization (APO) (Pryzant et al., 2023). As shown in Figure 1, we provide LLMs with three different inputs for the word sorting task: a zero-shot prompt, a Chain of Thought (CoT) prompt, and our optimized prompt. The responses are produced in a markedly distinct way. Specifically, the zero-shot prompt incorrectly identifies the alterate as the more common word alternate. However, the task requires faithfully preserving the given sequence of words rather than correcting them. With the CoT prompt, the sorting remains incorrect because the LLM does not fully grasp the sorting task and the word sequence.
Discussion / Conclusion. This study introduces a MARS method. Specifically, we develop a multi-agent framework incorporating Socratic guidance that includes a Planner agent and six task-specific agents for APO. First, to tackle the issue of limited flexibility of fixed templates, we utilize the Planner agent to autonomously design optimization paths for various tasks, ensuring that each task adheres to its own specific optimization trajectory. Next, we employ a Teacher-Critic-Student Socratic guidance dialogue pattern to iteratively refine the prompts while address the issue of inefficient search in prompt spaces. Finally, we validate the optimized prompts within the Target agent and iterate until the optimal prompt is identified. We conduct extensive experiments across a range of general tasks and domain-specific datasets to assess the effectiveness of MARS and explore the interpretability of the optimization process and results.