Related papers: Docling Technical Report

Related papers

Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text [75.77648333476776]
This paper introduces an automated pipeline for extracting BPMN models from text.<n>A key contribution of this work is the introduction of a newly annotated dataset.<n>We augment the dataset with 15 newly annotated documents containing 32 parallel gateways for model training.
arXiv Detail & Related papers (2025-07-11T07:25:55Z)
Relation-Rich Visual Document Generator for Visual Information Extraction [12.4941229258054]
We propose a Relation-rIch visual Document GEnerator (RIDGE) that addresses these limitations through a two-stage approach. Our method significantly enhances the performance of document understanding models on various VIE benchmarks.
arXiv Detail & Related papers (2025-04-14T19:19:26Z)
Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion [20.44433450426808]
Docling is an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion. It can parse several types of popular document formats into a unified, richly structured representation. Docling is released as a Python package and can be used as a Python API or as a CLI tool.
arXiv Detail & Related papers (2025-01-27T19:40:00Z)
UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition [55.153629718464565]
We introduce UniTabNet, a novel framework for table structure parsing based on the image-to-text model. UniTabNet employs a divide-and-conquer'' strategy, utilizing an image-to-text model to decouple table cells and integrating both physical and logical decoders to reconstruct the complete table structure.
arXiv Detail & Related papers (2024-09-20T01:26:32Z)
HDT: Hierarchical Document Transformer [70.2271469410557]
HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy. We develop a novel sparse attention kernel that considers the hierarchical structure of documents.
arXiv Detail & Related papers (2024-07-11T09:28:04Z)
DocSynthv2: A Practical Autoregressive Modeling for Document Generation [43.84027661517748]
This paper proposes a novel approach called Doc Synthv2 through the development of a simple yet effective autoregressive structured model. Our model, distinct in its integration of both layout and textual cues, marks a step beyond existing layout-generation approaches.
arXiv Detail & Related papers (2024-06-12T16:00:16Z)
A Standardized Machine-readable Dataset Documentation Format for Responsible AI [8.59437843168878]
Croissant-RAI is a machine-readable metadata format designed to enhance the discoverability, interoperability, and trustworthiness of AI datasets. It is integrated into major data search engines, repositories, and machine learning frameworks.
arXiv Detail & Related papers (2024-06-04T16:40:14Z)
KnowledgeHub: An end-to-end Tool for Assisted Scientific Discovery [1.6080795642111267]
This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline. This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations. A browser-based annotation tool enables annotating the contents of the PDF documents according to the ontology. A knowledge graph is constructed from these entity and relation triples which can be queried to obtain insights from the data.
arXiv Detail & Related papers (2024-05-16T13:17:14Z)
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond [17.853066545805554]
DocXChain is a powerful open-source toolchain for document parsing. It automatically converts the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations.
arXiv Detail & Related papers (2023-10-19T02:49:09Z)
Data Efficient Training of a U-Net Based Architecture for Structured Documents Localization [0.0]
We propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents. Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes.
arXiv Detail & Related papers (2023-10-02T07:05:19Z)
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding [55.4806974284156]
Document understanding refers to automatically extract, analyze and comprehend information from digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs) have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition.
arXiv Detail & Related papers (2023-07-04T11:28:07Z)
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. Our library supports a collection of pretrained Code LLM models and popular code benchmarks. We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z)
DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation. We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner. Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z)
GFTE: Graph-based Financial Table Extraction [66.26206038522339]
In financial industry and many other fields, tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images. We publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds. We propose a novel graph-based convolutional network model named GFTE as a baseline for future comparison.
arXiv Detail & Related papers (2020-03-17T07:10:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.