MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving

Paper · arXiv 2503.16905 · Published March 21, 2025
Multi-Agent SystemsMultimodal Models

Multimodal scientific problems (MSPs) involve complex issues that require the integration of multiple modalities, such as texts and diagrams, presenting a significant challenge in artificial intelligence. While progress has been made in addressing traditional scientific problems, MSPs still face two primary issues: the challenge of multi-modal comprehensive reasoning in scientific problem-solving and the lack of reflective and rethinking capabilities. To address these issues, we introduce a Multi-Agent framework based on the Big Seven Personality and Socratic guidance (MAPS)1. This framework employs seven distinct agents that leverage feedback mechanisms and the Socratic method to guide the resolution of MSPs. To tackle the first issue, we propose a progressive four-agent solving strategy, where each agent focuses on a specific stage of the problemsolving process. For the second issue, we introduce a Critic agent, inspired by Socratic questioning, which prompts critical thinking and stimulates autonomous learning. We conduct extensive experiments on the EMMA, Olympiad, and MathVista datasets, achieving promising results that outperform the current SOTA model by 15.84% across all tasks.

Introduction. Multimodal scientific problems (MSPs) cover scientific scenarios that involve multiple modalities [5, 16, 22], such as text and vision. These problems typically span fields like mathematics, physics, and chemistry, requiring rigorous logical reasoning and solid domain expertise [1, 3, 13, 36, 41]. In the realm of artificial intelligence, effectively addressing these cross-modal, multi-domain challenges remains both important and difficult [12, 20]. Figure 1 illustrates a typical problem scenario that includes the context, the question statement, and an illustrative diagram. The diagram shows a lever divided into left and right sections: an unknown mass is hung on the left and a known mass on the right, each positioned at one-quarter of the total length from the respective ends, with a groove at the center. The accompanying description states that when the unknown mass is hung on the left, the system remains in static equilibrium.

Discussion / Conclusion. This study introduces a MAPS approach utilizing a multiagent framework based on the Big Seven Personality theory and Socratic guidance to tackle the challenges of multimodal comprehensive reasoning and the lack of reflective capabilities. The framework involves seven agents, each specializing in distinct aspects of problem-solving. To address the first challenge, a four-agent strategy is proposed, where each agent focuses on specific stages of the reasoning process. Additionally, the Critic agent addresses the second challenge through Socratic reflection and critical feed- back. Extensive experiments on the EMMA, Olympiad, and MathVista datasets validate MAPS’s effectiveness in overcoming these issues and enhancing performance across various reasoning tasks. Meanwhile, we perform additional analytical experiments to assess the model’s advancement as well as its generalization.