DeALOG: Decentralized Multi-Agents Log-Mediated Reasoning Framework
- URL: http://arxiv.org/abs/2602.00996v1
- Date: Sun, 01 Feb 2026 03:26:52 GMT
- Title: DeALOG: Decentralized Multi-Agents Log-Mediated Reasoning Framework
- Authors: Abhijit Chakraborty, Ashish Raj Shekhar, Shiven Agarwal, Vivek Gupta,
- Abstract summary: DeALOG is a decentralized multi-agent framework for multimodal question answering.<n>It uses specialized agents: Table, Context, Visual, Summarizing and Verification.
- Score: 7.772295511115406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Complex question answering across text, tables and images requires integrating diverse information sources. A framework supporting specialized processing with coordination and interpretability is needed. We introduce DeALOG, a decentralized multi-agent framework for multimodal question answering. It uses specialized agents: Table, Context, Visual, Summarizing and Verification, that communicate through a shared natural-language log as persistent memory. This log-based approach enables collaborative error detection and verification without central control, improving robustness. Evaluations on FinQA, TAT-QA, CRT-QA, WikiTableQuestions, FeTaQA, and MultiModalQA show competitive performance. Analysis confirms the importance of the shared log, agent specialization, and verification for accuracy. DeALOG, provides a scalable approach through modular components using natural-language communication.
Related papers
- DyTopo: Dynamic Topology Routing for Multi-Agent Reasoning via Semantic Matching [15.07152520738373]
We introduce DyTopo, a manager-guided multi-agent framework that reconstructs a sparse directed communication graph at each round.<n>Conditioned on the manager's round goal, each agent outputs lightweight natural-language query (need) and key (offer) descriptors.<n>DyTopo embeds these descriptors and performs semantic matching, routing private messages only along the induced edges.
arXiv Detail & Related papers (2026-02-05T18:59:51Z) - Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything [12.274140974616747]
Multimodal large language models (MLLMs) have shown strong capabilities but remain limited to fixed modality pairs.<n>We propose an Agent- Omni framework that coordinates existing foundation models through a master-agent system.
arXiv Detail & Related papers (2025-11-04T18:59:09Z) - Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z) - AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z) - Align Your Query: Representation Alignment for Multimodality Medical Object Detection [55.86070915426998]
We propose a detector-agnostic framework to align representations with modality context.<n>We integrate modality tokens into the detection process via Multimodality Context Attention.<n>The proposed approach consistently improves AP with minimal overhead and no architectural modifications.
arXiv Detail & Related papers (2025-10-03T07:49:21Z) - MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks [56.350173737493215]
Environmental, Social, and Governance (ESG) reports are essential for evaluating sustainability practices, ensuring regulatory compliance, and promoting financial transparency.<n>MMESGBench is a first-of-its-kind benchmark dataset to evaluate multimodal understanding and complex reasoning across structurally diverse and multi-source ESG documents.<n>MMESGBench comprises 933 validated QA pairs derived from 45 ESG documents, spanning across seven distinct document types and three major ESG source categories.
arXiv Detail & Related papers (2025-07-25T03:58:07Z) - AgentMaster: A Multi-Agent Conversational Framework Using A2A and MCP Protocols for Multimodal Information Retrieval and Analysis [0.0]
We present a pilot study of AgentMaster, a novel modular multi-protocol MAS framework with self-implemented A2A and MCP.<n>The system supports natural language interaction without prior technical expertise and responds to multimodal queries for tasks including information retrieval, question answering, and image analysis.<n>Overall, our proposed framework contributes to the potential capabilities of domain-specific, cooperative, and scalable conversational AI powered by MAS.
arXiv Detail & Related papers (2025-07-08T03:34:26Z) - AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction [77.62279834617475]
We propose a new framework that rethinks multi-agent coordination through a sequential structure rather than a graph structure.<n>Our method focuses on two key directions: (1) Next-Agent Prediction, which selects the most suitable agent role at each step, and (2) Next-Context Selection, which enables each agent to selectively access relevant information from any previous step.
arXiv Detail & Related papers (2025-06-21T18:34:43Z) - Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective [42.832839189236694]
We propose MAMMQA, a multi-agent QA framework for multimodal inputs spanning text, tables, and images.<n>Our system includes two Visual Language Model (VLM) agents and one text-based Large Language Model (LLM) agent.<n> Experiments on diverse multimodal QA benchmarks demonstrate that our cooperative, multi-agent framework consistently outperforms existing baselines in both accuracy and robustness.
arXiv Detail & Related papers (2025-05-27T07:23:38Z) - HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation [11.53083922927901]
HM-RAG is a novel Hierarchical Multi-agent Multimodal RAG framework.<n>It pioneers collaborative intelligence for dynamic knowledge synthesis across structured, unstructured, and graph-based data.
arXiv Detail & Related papers (2025-04-13T06:55:33Z) - VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation [100.06122876025063]
This paper introduces VisDoMBench, the first comprehensive benchmark designed to evaluate QA systems in multi-document settings.<n>We propose VisDoMRAG, a novel multimodal Retrieval Augmented Generation (RAG) approach that simultaneously utilizes visual and textual RAG.
arXiv Detail & Related papers (2024-12-14T06:24:55Z) - Relation-Aware Language-Graph Transformer for Question Answering [21.244992938222246]
We propose Question Answering Transformer (QAT), which is designed to jointly reason over language and graphs with respect to entity relations.
Specifically, QAT constructs Meta-Path tokens, which learn relation-centric embeddings based on diverse structural and semantic relations.
We validate the effectiveness of QAT on commonsense question answering datasets like CommonsenseQA and OpenBookQA, and on a medical question answering dataset, MedQA-USMLE.
arXiv Detail & Related papers (2022-12-02T05:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.