Related papers: ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering

ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering

URL: http://arxiv.org/abs/2603.02438v1
Date: Mon, 02 Mar 2026 22:21:25 GMT
Title: ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering
Authors: Aymen Lassoued, Mohamed Ali Souibgui, Yousri Kessentini,
Abstract summary: We present ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering.<n>Our framework leverages a set of specialized AI agents, each dedicated to a distinct modality, enabling fine-language understanding and collaborative reasoning across diverse document components.<n>Our approach achieves significant improvements over state-of-the-art methods, establishing a new paradigm for collaborative agent systems in vision-grained reasoning.
Score: 8.852709980681626
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Document Visual Question Answering (DocVQA) remains challenging for existing Vision-Language Models (VLMs), especially under complex reasoning and multi-step workflows. Current approaches struggle to decompose intricate questions into manageable sub-tasks and often fail to leverage specialized processing paths for different document elements. We present ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering, a novel multi-agent framework that addresses these limitations through strategic agent coordination and iterative refinement. ORCA begins with a reasoning agent that decomposes queries into logical steps, followed by a routing mechanism that activates task-specific agents from a specialized agent dock. Our framework leverages a set of specialized AI agents, each dedicated to a distinct modality, enabling fine-grained understanding and collaborative reasoning across diverse document components. To ensure answer reliability, ORCA employs a debate mechanism with stress-testing, and when necessary, a thesis-antithesis adjudication process. This is followed by a sanity checker to ensure format consistency. Extensive experiments on three benchmarks demonstrate that our approach achieves significant improvements over state-of-the-art methods, establishing a new paradigm for collaborative agent systems in vision-language reasoning.

Related papers

AgenticSimLaw: A Juvenile Courtroom Multi-Agent Debate Simulation for Explainable High-Stakes Tabular Decision Making [0.6218206949753592]
We introduce AgenticSimLaw, a role-structured, multi-agent debate framework that provides transparent and controllable testtime reasoning.<n>Unlike black-box approaches, our courtroom-style orchestration explicitly defines agent roles.<n>We benchmark this framework on young adult recidivism prediction using the NLSY97 dataset.
arXiv Detail & Related papers (2026-01-29T16:26:10Z)
Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search [56.78490647843876]
Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (LLMs) to interleave reasoning with tool use.<n>We propose bfM-ASK, a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context.
arXiv Detail & Related papers (2026-01-08T08:13:27Z)
COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context [17.575806280348797]
Small errors compound across steps, and even state-of-the-art models often hallucinate or lose coherence.<n>We propose a lightweight hierarchical framework that separates tactical execution, strategic oversight, and context organization into three specialized components.
arXiv Detail & Related papers (2025-10-09T20:14:26Z)
AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z)
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization [86.98098988779809]
We propose SummQ, a novel adversarial multi-agent framework for long document summarization.<n>Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries.<n>We evaluate SummQ on three widely used long document summarization benchmarks.
arXiv Detail & Related papers (2025-09-25T08:36:19Z)
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling [83.78874399606379]
We propose MACT, a Multi-Agent Collaboration framework with Test-Time scaling.<n>It comprises four distinct small-scale agents, with clearly defined roles and effective collaboration.<n>It shows superior performance with a smaller parameter scale without sacrificing the ability of general and mathematical tasks.
arXiv Detail & Related papers (2025-08-05T12:52:09Z)
Deep Research Agents: A Systematic Examination And Roadmap [109.53237992384872]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z)
MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning [36.3918410061572]
MA-RAG addresses the inherent ambiguities and reasoning challenges in complex information-seeking tasks.<n>Unlike conventional RAG methods that rely on end-to-end fine-tuning or isolated component enhancements, MA-RAG orchestrates a collaborative set of specialized AI agents.<n>Our results highlight the effectiveness of collaborative, modular reasoning in retrieval-augmented systems.
arXiv Detail & Related papers (2025-05-26T15:05:18Z)
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions [51.96890647837277]
Large Language Models (LLMs) have propelled conversational AI from traditional dialogue systems into sophisticated agents capable of autonomous actions, contextual awareness, and multi-turn interactions with users.<n>This survey paper presents a desideratum for next-generation Conversational Agents - what has been achieved, what challenges persist, and what must be done for more scalable systems that approach human-level intelligence.
arXiv Detail & Related papers (2025-04-07T21:01:25Z)
Agent-Oriented Planning in Multi-Agent Systems [54.429028104022066]
We propose AOP, a novel framework for agent-oriented planning in multi-agent systems.<n>In this study, we identify three critical design principles of agent-oriented planning, including solvability, completeness, and non-redundancy.<n> Extensive experiments demonstrate the advancement of AOP in solving real-world problems compared to both single-agent systems and existing planning strategies for multi-agent systems.
arXiv Detail & Related papers (2024-10-03T04:07:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.