Related papers: DocDancer: Towards Agentic Document-Grounded Information Seeking

DocDancer: Towards Agentic Document-Grounded Information Seeking

URL: http://arxiv.org/abs/2601.05163v1
Date: Thu, 08 Jan 2026 17:54:32 GMT
Title: DocDancer: Towards Agentic Document-Grounded Information Seeking
Authors: Qintong Zhang, Xinjie Lv, Jialong Wu, Baixuan Li, Zhengwei Tao, Guochen Yan, Huanyao Zhang, Bin Wang, Jiahao Xu, Haitao Mi, Wentao Zhang,
Abstract summary: Document Question Answering (DocQA) focuses on answering questions grounded in given documents.<n>Existing DocQA agents lack effective tool utilization and largely rely on closed-source models.<n>We introduce DocDancer, an end-to-end trained open-source Doc agent.
Score: 27.08333983540891
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and largely rely on closed-source models. In this work, we introduce DocDancer, an end-to-end trained open-source Doc agent. We formulate DocQA as an information-seeking problem and propose a tool-driven agent framework that explicitly models document exploration and comprehension. To enable end-to-end training of such agents, we introduce an Exploration-then-Synthesis data synthesis pipeline that addresses the scarcity of high-quality training data for DocQA. Training on the synthesized data, the trained models on two long-context document understanding benchmarks, MMLongBench-Doc and DocBench, show their effectiveness. Further analysis provides valuable insights for the agentic tool design and synthetic data.

Related papers

Demonstrating ViviDoc: Generating Interactive Documents through Human-Agent Collaboration [4.751545995185441]
We present ViviDoc, a human-agent collaborative system that generates interactive educational documents from a single topic input.<n>ViviDoc introduces a multi-agent pipeline (Planner, Executor, Evaluator) and the Document Specification (DocSpec), a human-readable intermediate representation.<n>Expert evaluation and a user study show that ViviDoc substantially outperforms naive agentic generation and provides an intuitive editing experience.
arXiv Detail & Related papers (2026-03-02T14:27:49Z)
LongDA: Benchmarking LLM Agents for Long-Document Data Analysis [55.32211515932351]
LongDA targets real-world settings in which navigating long documentation and complex data is the primary bottleneck.<n>LongTA is a tool-augmented agent framework that enables document access, retrieval, and code execution.<n>Our experiments reveal substantial performance gaps even among state-of-the-art models.
arXiv Detail & Related papers (2026-01-05T23:23:16Z)
OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive [50.468138755368805]
Opioid crisis represents a significant moment in public health.<n>Data and documents disclosed in the UCSF-JHU Opioid Industry Documents Archive (OIDA)<n>In this paper, we tackle this challenge by organizing the original dataset according to document attributes.
arXiv Detail & Related papers (2025-11-13T03:27:32Z)
ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained Links [57.514511353084565]
We introduce a new domain-agnostic framework for selecting a best-performing approach and annotating cross-document links.<n>We apply our framework in two distinct domains -- peer review and news.<n>The resulting novel datasets lay foundation for numerous cross-document tasks like media framing and peer review.
arXiv Detail & Related papers (2025-09-01T11:32:24Z)
DocAgent: A Multi-Agent System for Automated Code Documentation Generation [7.653779364214401]
We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental context building.<n>Specialized agents (Reader, Searcher, Writer, Verifier, Orchestrator) then collaboratively generate documentation.<n>We also propose a multi-faceted evaluation framework assessing Completeness, Helpfulness, and Truthfulness.
arXiv Detail & Related papers (2025-04-11T17:50:08Z)
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations [2.9798896492745537]
We present a unified dataset for document Question-Answering (QA)<n>We reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task.<n>On the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box.
arXiv Detail & Related papers (2025-01-06T21:46:22Z)
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks [57.589795399265945]
We introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks.<n>We also introduce BigDocs-Bench, a benchmark suite with 10 novel tasks.<n>Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o.
arXiv Detail & Related papers (2024-12-05T21:41:20Z)
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models [66.91204604417912]
This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs. We present a new framework (called DocKD) that enriches the data generation process by integrating external document knowledge. Experiments show that DocKD produces high-quality document annotations and surpasses the direct knowledge distillation approach.
arXiv Detail & Related papers (2024-10-04T00:53:32Z)
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding [23.910783272007407]
This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU) Addressing the challenges of data acquisition and the limitations of existing datasets, SynthDoc leverages publicly available corpora and advanced rendering tools to create a comprehensive and versatile dataset. Our experiments, conducted using the Donut model, demonstrate that models trained with SynthDoc's data achieve superior performance in pre-training read tasks and maintain robustness in downstream tasks, despite language inconsistencies.
arXiv Detail & Related papers (2024-08-27T03:31:24Z)
Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.