Docling Technical Report
- URL: http://arxiv.org/abs/2408.09869v5
- Date: Mon, 09 Dec 2024 09:20:54 GMT
- Title: Docling Technical Report
- Authors: Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Nikolaos Livathinos, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti, Fabian Lindlbauer, Kasper Dinkla, Lokesh Mishra, Yusik Kim, Shubham Gupta, Rafael Teixeira de Lima, Valery Weber, Lucas Morin, Ingmar Meijer, Viktor Kuropiatnyk, Peter W. J. Staar,
- Abstract summary: Docling is an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion.
It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer)
- Score: 19.80268711310715
- License:
- Abstract: This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. The code interface allows for easy extensibility and addition of new features and models.
Related papers
- Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion [20.44433450426808]
Docling is an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion.
It can parse several types of popular document formats into a unified, richly structured representation.
Docling is released as a Python package and can be used as a Python API or as a CLI tool.
arXiv Detail & Related papers (2025-01-27T19:40:00Z) - UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition [55.153629718464565]
We introduce UniTabNet, a novel framework for table structure parsing based on the image-to-text model.
UniTabNet employs a divide-and-conquer'' strategy, utilizing an image-to-text model to decouple table cells and integrating both physical and logical decoders to reconstruct the complete table structure.
arXiv Detail & Related papers (2024-09-20T01:26:32Z) - HDT: Hierarchical Document Transformer [70.2271469410557]
HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy.
We develop a novel sparse attention kernel that considers the hierarchical structure of documents.
arXiv Detail & Related papers (2024-07-11T09:28:04Z) - DocSynthv2: A Practical Autoregressive Modeling for Document Generation [43.84027661517748]
This paper proposes a novel approach called Doc Synthv2 through the development of a simple yet effective autoregressive structured model.
Our model, distinct in its integration of both layout and textual cues, marks a step beyond existing layout-generation approaches.
arXiv Detail & Related papers (2024-06-12T16:00:16Z) - A Standardized Machine-readable Dataset Documentation Format for Responsible AI [8.59437843168878]
Croissant-RAI is a machine-readable metadata format designed to enhance the discoverability, interoperability, and trustworthiness of AI datasets.
It is integrated into major data search engines, repositories, and machine learning frameworks.
arXiv Detail & Related papers (2024-06-04T16:40:14Z) - KnowledgeHub: An end-to-end Tool for Assisted Scientific Discovery [1.6080795642111267]
This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline.
This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations.
A browser-based annotation tool enables annotating the contents of the PDF documents according to the ontology.
A knowledge graph is constructed from these entity and relation triples which can be queried to obtain insights from the data.
arXiv Detail & Related papers (2024-05-16T13:17:14Z) - DocXChain: A Powerful Open-Source Toolchain for Document Parsing and
Beyond [17.853066545805554]
DocXChain is a powerful open-source toolchain for document parsing.
It automatically converts the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations.
arXiv Detail & Related papers (2023-10-19T02:49:09Z) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document
Understanding [55.4806974284156]
Document understanding refers to automatically extract, analyze and comprehend information from digital documents, such as a web page.
Existing Multi-model Large Language Models (MLLMs) have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition.
arXiv Detail & Related papers (2023-07-04T11:28:07Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z) - DOC2PPT: Automatic Presentation Slides Generation from Scientific
Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation.
We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner.
Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z) - GFTE: Graph-based Financial Table Extraction [66.26206038522339]
In financial industry and many other fields, tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images.
We publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds.
We propose a novel graph-based convolutional network model named GFTE as a baseline for future comparison.
arXiv Detail & Related papers (2020-03-17T07:10:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.