Related papers: DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral

DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral

URL: http://arxiv.org/abs/2505.03214v1
Date: Tue, 06 May 2025 06:02:42 GMT
Title: DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral
Authors: Qiang Sun, Sirui Li, Tingting Bi, Du Huynh, Mark Reynolds, Yuanyi Luo, Wei Liu,
Abstract summary: Acquiring structured data from domain-specific, image-based documents is crucial for many downstream tasks.<n>Many documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems.<n>We present DocSpiral, the first Human-in-the-Spiral assistive document annotation platform.
Score: 11.336757553731639
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Acquiring structured data from domain-specific, image-based documents such as scanned reports is crucial for many downstream tasks but remains challenging due to document variability. Many of these documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems. We present DocSpiral, the first Human-in-the-Spiral assistive document annotation platform, designed to address the challenge of extracting structured information from domain-specific, image-based document collections. Our spiral design establishes an iterative cycle in which human annotations train models that progressively require less manual intervention. DocSpiral integrates document format normalization, comprehensive annotation interfaces, evaluation metrics dashboard, and API endpoints for the development of AI / ML models into a unified workflow. Experiments demonstrate that our framework reduces annotation time by at least 41\% while showing consistent performance gains across three iterations during model training. By making this annotation platform freely accessible, we aim to lower barriers to AI/ML models development in document processing, facilitating the adoption of large language models in image-based, document-intensive fields such as geoscience and healthcare. The system is freely available at: https://app.ai4wa.com. The demonstration video is available: https://app.ai4wa.com/docs/docspiral/demo.

Related papers

DocMamba: Efficient Document Pre-training with State Space Model [56.84200017560988]
We present DocMamba, a novel framework based on the state space model.<n>It is designed to reduce computational complexity to linear while preserving global modeling capabilities.<n>Experiments on the HRDoc confirm DocMamba's potential for length extrapolation.
arXiv Detail & Related papers (2024-09-18T11:34:28Z)
Visually Guided Generative Text-Layout Pre-training for Document Intelligence [51.09853181377696]
We propose visually guided generative text-pre-training, named ViTLP. Given a document image, the model optimize hierarchical language and layout modeling objectives to generate the interleaved text and layout sequence. ViTLP can function as a native OCR model to localize and recognize texts of document images.
arXiv Detail & Related papers (2024-03-25T08:00:43Z)
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation [15.953725529361874]
Document layout analysis is a known problem to the documents research community. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches.
arXiv Detail & Related papers (2023-05-01T12:47:55Z)
DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop [0.0]
DoSA (Document Specific Automated s) helps annotators in generating initial annotations automatically using our novel bootstrap approach. An open-source ready-to-use implementation is made available on GitHub.
arXiv Detail & Related papers (2022-11-09T15:04:07Z)
XDoc: Unified Pre-training for Cross-Format Document Understanding [84.63416346227176]
XDoc is a unified pre-trained model which deals with different document formats in a single model. XDoc achieves comparable or even better performance on a variety of downstream tasks compared with the individual pre-trained models.
arXiv Detail & Related papers (2022-10-06T12:07:18Z)
Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)
Synthetic Document Generator for Annotation-free Layout Recognition [15.657295650492948]
We describe a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of layout elements. We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents.
arXiv Detail & Related papers (2021-11-11T01:58:44Z)
One-shot Key Information Extraction from Document with Deep Partial Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios. Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents. We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z)
SelfDoc: Self-Supervised Document Representation Learning [46.22910270334824]
SelfDoc is a task-agnostic pre-training framework for document image understanding. Our framework exploits the positional, textual, and visual information of every semantically meaningful component in a document. It achieves superior performance on multiple downstream tasks with significantly fewer document images used in the pre-training stage compared to previous works.
arXiv Detail & Related papers (2021-06-07T04:19:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.