Demonstrating ViviDoc: Generating Interactive Documents through Human-Agent Collaboration
- URL: http://arxiv.org/abs/2603.01912v1
- Date: Mon, 02 Mar 2026 14:27:49 GMT
- Title: Demonstrating ViviDoc: Generating Interactive Documents through Human-Agent Collaboration
- Authors: Yinghao Tang, Yupeng Xie, Yingchaojie Feng, Tingfeng Lan, Wei Chen,
- Abstract summary: We present ViviDoc, a human-agent collaborative system that generates interactive educational documents from a single topic input.<n>ViviDoc introduces a multi-agent pipeline (Planner, Executor, Evaluator) and the Document Specification (DocSpec), a human-readable intermediate representation.<n>Expert evaluation and a user study show that ViviDoc substantially outperforms naive agentic generation and provides an intuitive editing experience.
- Score: 4.751545995185441
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interactive articles help readers engage with complex ideas through exploration, yet creating them remains costly, requiring both domain expertise and web development skills. Recent LLM-based agents can automate content creation, but naively applying them yields uncontrollable and unverifiable outputs. We present ViviDoc, a human-agent collaborative system that generates interactive educational documents from a single topic input. ViviDoc introduces a multi-agent pipeline (Planner, Executor, Evaluator) and the Document Specification (DocSpec), a human-readable intermediate representation that decomposes each interactive visualization into State, Render, Transition, and Constraint components. The DocSpec enables educators to review and refine generation plans before code is produced, bridging the gap between pedagogical intent and executable output. Expert evaluation and a user study show that ViviDoc substantially outperforms naive agentic generation and provides an intuitive editing experience. Our project homepage is available at https://vividoc-homepage.vercel.app/.
Related papers
- DocDancer: Towards Agentic Document-Grounded Information Seeking [27.08333983540891]
Document Question Answering (DocQA) focuses on answering questions grounded in given documents.<n>Existing DocQA agents lack effective tool utilization and largely rely on closed-source models.<n>We introduce DocDancer, an end-to-end trained open-source Doc agent.
arXiv Detail & Related papers (2026-01-08T17:54:32Z) - DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding [59.4112754806335]
We propose DocLens, a tool-augmented multi-agent framework that effectively zooms in'' on evidence like a lens.<n>It first navigates from the full document to specific visual elements on relevant pages, then employs a sampling-adjudication mechanism to generate a single, reliable answer.<n>It achieves state-of-the-art performance on MMLongBench-Doc and FinRAG-V, surpassing even human experts.
arXiv Detail & Related papers (2025-11-14T18:42:18Z) - Paper2Web: Let's Make Your Paper Alive! [51.75896846964824]
We introduce Paper2Web, a benchmark dataset and framework for assessing academic webpage generation.<n>We present PWAgent, an autonomous pipeline that converts scientific papers into interactive and multimedia-rich academic homepages.
arXiv Detail & Related papers (2025-10-17T17:35:58Z) - ExpVid: A Benchmark for Experiment Video Understanding & Reasoning [65.17173232816818]
We introduce ExpVid, the first benchmark designed to systematically evaluate MLLMs on scientific experiment videos.<n>We evaluate 19 leading MLLMs on ExpVid and find that while they excel at coarse-grained recognition, they struggle with disambiguating fine details, tracking state changes over time, and linking experimental procedures to scientific outcomes.<n>Our results reveal a notable performance gap between proprietary and open-source models, particularly in high-order reasoning.
arXiv Detail & Related papers (2025-10-13T16:45:28Z) - DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral [11.336757553731639]
Acquiring structured data from domain-specific, image-based documents is crucial for many downstream tasks.<n>Many documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems.<n>We present DocSpiral, the first Human-in-the-Spiral assistive document annotation platform.
arXiv Detail & Related papers (2025-05-06T06:02:42Z) - DocAgent: A Multi-Agent System for Automated Code Documentation Generation [7.653779364214401]
We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental context building.<n>Specialized agents (Reader, Searcher, Writer, Verifier, Orchestrator) then collaboratively generate documentation.<n>We also propose a multi-faceted evaluation framework assessing Completeness, Helpfulness, and Truthfulness.
arXiv Detail & Related papers (2025-04-11T17:50:08Z) - BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks [57.589795399265945]
We introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks.<n>We also introduce BigDocs-Bench, a benchmark suite with 10 novel tasks.<n>Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o.
arXiv Detail & Related papers (2024-12-05T21:41:20Z) - DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models [66.91204604417912]
This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs.
We present a new framework (called DocKD) that enriches the data generation process by integrating external document knowledge.
Experiments show that DocKD produces high-quality document annotations and surpasses the direct knowledge distillation approach.
arXiv Detail & Related papers (2024-10-04T00:53:32Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.