Related papers: DeALOG: Decentralized Multi-Agents Log-Mediated Reasoning Framework

DeALOG: Decentralized Multi-Agents Log-Mediated Reasoning Framework

URL: http://arxiv.org/abs/2602.00996v1
Date: Sun, 01 Feb 2026 03:26:52 GMT
Title: DeALOG: Decentralized Multi-Agents Log-Mediated Reasoning Framework
Authors: Abhijit Chakraborty, Ashish Raj Shekhar, Shiven Agarwal, Vivek Gupta,
Abstract summary: DeALOG is a decentralized multi-agent framework for multimodal question answering.<n>It uses specialized agents: Table, Context, Visual, Summarizing and Verification.
Score: 7.772295511115406
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Complex question answering across text, tables and images requires integrating diverse information sources. A framework supporting specialized processing with coordination and interpretability is needed. We introduce DeALOG, a decentralized multi-agent framework for multimodal question answering. It uses specialized agents: Table, Context, Visual, Summarizing and Verification, that communicate through a shared natural-language log as persistent memory. This log-based approach enables collaborative error detection and verification without central control, improving robustness. Evaluations on FinQA, TAT-QA, CRT-QA, WikiTableQuestions, FeTaQA, and MultiModalQA show competitive performance. Analysis confirms the importance of the shared log, agent specialization, and verification for accuracy. DeALOG, provides a scalable approach through modular components using natural-language communication.

Related papers

DyTopo: Dynamic Topology Routing for Multi-Agent Reasoning via Semantic Matching [15.07152520738373]
We introduce DyTopo, a manager-guided multi-agent framework that reconstructs a sparse directed communication graph at each round.<n>Conditioned on the manager's round goal, each agent outputs lightweight natural-language query (need) and key (offer) descriptors.<n>DyTopo embeds these descriptors and performs semantic matching, routing private messages only along the induced edges.
arXiv Detail & Related papers (2026-02-05T18:59:51Z)
Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything [12.274140974616747]
Multimodal large language models (MLLMs) have shown strong capabilities but remain limited to fixed modality pairs.<n>We propose an Agent- Omni framework that coordinates existing foundation models through a master-agent system.
arXiv Detail & Related papers (2025-11-04T18:59:09Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z)
Align Your Query: Representation Alignment for Multimodality Medical Object Detection [55.86070915426998]
We propose a detector-agnostic framework to align representations with modality context.<n>We integrate modality tokens into the detection process via Multimodality Context Attention.<n>The proposed approach consistently improves AP with minimal overhead and no architectural modifications.
arXiv Detail & Related papers (2025-10-03T07:49:21Z)
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks [56.350173737493215]
Environmental, Social, and Governance (ESG) reports are essential for evaluating sustainability practices, ensuring regulatory compliance, and promoting financial transparency.<n>MMESGBench is a first-of-its-kind benchmark dataset to evaluate multimodal understanding and complex reasoning across structurally diverse and multi-source ESG documents.<n>MMESGBench comprises 933 validated QA pairs derived from 45 ESG documents, spanning across seven distinct document types and three major ESG source categories.
arXiv Detail & Related papers (2025-07-25T03:58:07Z)
AgentMaster: A Multi-Agent Conversational Framework Using A2A and MCP Protocols for Multimodal Information Retrieval and Analysis [0.0]
We present a pilot study of AgentMaster, a novel modular multi-protocol MAS framework with self-implemented A2A and MCP.<n>The system supports natural language interaction without prior technical expertise and responds to multimodal queries for tasks including information retrieval, question answering, and image analysis.<n>Overall, our proposed framework contributes to the potential capabilities of domain-specific, cooperative, and scalable conversational AI powered by MAS.
arXiv Detail & Related papers (2025-07-08T03:34:26Z)
AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction [77.62279834617475]
We propose a new framework that rethinks multi-agent coordination through a sequential structure rather than a graph structure.<n>Our method focuses on two key directions: (1) Next-Agent Prediction, which selects the most suitable agent role at each step, and (2) Next-Context Selection, which enables each agent to selectively access relevant information from any previous step.
arXiv Detail & Related papers (2025-06-21T18:34:43Z)
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective [42.832839189236694]
We propose MAMMQA, a multi-agent QA framework for multimodal inputs spanning text, tables, and images.<n>Our system includes two Visual Language Model (VLM) agents and one text-based Large Language Model (LLM) agent.<n> Experiments on diverse multimodal QA benchmarks demonstrate that our cooperative, multi-agent framework consistently outperforms existing baselines in both accuracy and robustness.
arXiv Detail & Related papers (2025-05-27T07:23:38Z)
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation [11.53083922927901]
HM-RAG is a novel Hierarchical Multi-agent Multimodal RAG framework.<n>It pioneers collaborative intelligence for dynamic knowledge synthesis across structured, unstructured, and graph-based data.
arXiv Detail & Related papers (2025-04-13T06:55:33Z)
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation [100.06122876025063]
This paper introduces VisDoMBench, the first comprehensive benchmark designed to evaluate QA systems in multi-document settings.<n>We propose VisDoMRAG, a novel multimodal Retrieval Augmented Generation (RAG) approach that simultaneously utilizes visual and textual RAG.
arXiv Detail & Related papers (2024-12-14T06:24:55Z)
Relation-Aware Language-Graph Transformer for Question Answering [21.244992938222246]
We propose Question Answering Transformer (QAT), which is designed to jointly reason over language and graphs with respect to entity relations. Specifically, QAT constructs Meta-Path tokens, which learn relation-centric embeddings based on diverse structural and semantic relations. We validate the effectiveness of QAT on commonsense question answering datasets like CommonsenseQA and OpenBookQA, and on a medical question answering dataset, MedQA-USMLE.
arXiv Detail & Related papers (2022-12-02T05:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.