Related papers: WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate

WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate

URL: http://arxiv.org/abs/2512.02405v1
Date: Tue, 02 Dec 2025 04:31:52 GMT
Title: WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate
Authors: Anoop Cherian, River Doyle, Eyal Ben-Dov, Suhas Lohit, Kuan-Chuan Peng,
Abstract summary: Multi-agent debate (MAD) has emerged as a popular way to leverage these strengths for robust reasoning.<n>Our setup enables generalizing the debate protocol with heterogeneous experts that possess single- and multi-modal capabilities.<n>We show that WISE consistently improves accuracy by 2-7% over state-of-the-art MAD setups and aggregation methods.
Score: 31.549907845278327
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent large language models (LLMs) are trained on diverse corpora and tasks, leading them to develop complementary strengths. Multi-agent debate (MAD) has emerged as a popular way to leverage these strengths for robust reasoning, though it has mostly been applied to language-only tasks, leaving its efficacy on multimodal problems underexplored. In this paper, we study MAD for solving vision-and-language reasoning problems. Our setup enables generalizing the debate protocol with heterogeneous experts that possess single- and multi-modal capabilities. To this end, we present Weighted Iterative Society-of-Experts (WISE), a generalized and modular MAD framework that partitions the agents into Solvers, that generate solutions, and Reflectors, that verify correctness, assign weights, and provide natural language feedback. To aggregate the agents' solutions across debate rounds, while accounting for variance in their responses and the feedback weights, we present a modified Dawid-Skene algorithm for post-processing that integrates our two-stage debate model. We evaluate WISE on SMART-840, VisualPuzzles, EvoChart-QA, and a new SMART-840++ dataset with programmatically generated problem instances of controlled difficulty. Our results show that WISE consistently improves accuracy by 2-7% over the state-of-the-art MAD setups and aggregation methods across diverse multimodal tasks and LLM configurations.

Related papers

MMhops-R1: Multimodal Multi-hop Reasoning [89.68086555694084]
We introduce MMhops, a novel benchmark designed to evaluate and foster multi-modal multi-hop reasoning.<n> MMhops dataset comprises two challenging task formats, Bridging and Comparison.<n>We propose MMhops-R1, a novel multi-modal Retrieval-Augmented Generation framework for dynamic reasoning.
arXiv Detail & Related papers (2025-12-15T17:29:02Z)
MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction [52.89860691282002]
Implicit Attribute Value Extraction (AVE) is essential for accurately representing products in e-commerce.<n>Despite advances in multimodal large language models (MLLMs), implicit AVE remains challenging due to the complexity of multidimensional data.<n>We introduce textscmodelname, a multi-agent debate framework that employs multiple MLLM agents to iteratively refine inferences.
arXiv Detail & Related papers (2025-10-07T06:27:42Z)
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective [42.832839189236694]
We propose MAMMQA, a multi-agent QA framework for multimodal inputs spanning text, tables, and images.<n>Our system includes two Visual Language Model (VLM) agents and one text-based Large Language Model (LLM) agent.<n> Experiments on diverse multimodal QA benchmarks demonstrate that our cooperative, multi-agent framework consistently outperforms existing baselines in both accuracy and robustness.
arXiv Detail & Related papers (2025-05-27T07:23:38Z)
Is Multi-Agent Debate (MAD) the Silver Bullet? An Empirical Analysis of MAD in Code Summarization and Translation [10.038721196640864]
Multi-Agent Debate (MAD) systems enable structured debates among Large Language Models (LLMs)<n> MAD promotes divergent thinking through role-specific agents, dynamic interactions, and structured decision-making.<n>This study investigates MAD's effectiveness on two Software Engineering (SE) tasks.
arXiv Detail & Related papers (2025-03-15T07:30:37Z)
Multidimensional Consistency Improves Reasoning in Language Models [21.989335720239467]
We introduce a framework for testing models for answer consistency across multiple input variations.<n>We induce variations in (i) order of shots in prompt, (ii) problem phrasing, and (iii) languages used.<n>Our framework consistently enhances mathematical reasoning performance on both monolingual dataset GSM8K and multilingual dataset MGSM, especially for smaller models.
arXiv Detail & Related papers (2025-03-04T14:41:05Z)
Multi-LLM Collaborative Search for Complex Problem Solving [54.194370845153784]
We propose the Mixture-of-Search-Agents (MoSA) paradigm to enhance search-based reasoning.<n>MoSA integrates diverse reasoning pathways by combining independent exploration with iterative refinement among LLMs.<n>Using Monte Carlo Tree Search (MCTS) as a backbone, MoSA enables multiple agents to propose and aggregate reasoning steps, resulting in improved accuracy.
arXiv Detail & Related papers (2025-02-26T06:31:04Z)
Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs)<n>We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs.<n>We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z)
Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges [5.934258790280767]
Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems. This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP) We introduce a novel approach employing multiple specialized agents within the MLLM framework, each dedicated to optimizing solutions for these challenges.
arXiv Detail & Related papers (2024-06-26T07:12:06Z)
Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents. There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.