DSG: An End-to-End Document Structure Generator
- URL: http://arxiv.org/abs/2310.09118v1
- Date: Fri, 13 Oct 2023 14:03:01 GMT
- Title: DSG: An End-to-End Document Structure Generator
- Authors: Johannes Rausch and Gentiana Rashiti and Maxim Gusev and Ce Zhang and
Stefan Feuerriegel
- Abstract summary: Document Structure Generator (DSG) is a novel system for document parsing that is fully end-to-end trainable.
Our results demonstrate that our DSG outperforms commercial OCR tools and, on top of that, achieves state-of-the-art performance.
- Score: 32.040520771901996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information in industry, research, and the public sector is widely stored as
rendered documents (e.g., PDF files, scans). Hence, to enable downstream tasks,
systems are needed that map rendered documents onto a structured hierarchical
format. However, existing systems for this task are limited by heuristics and
are not end-to-end trainable. In this work, we introduce the Document Structure
Generator (DSG), a novel system for document parsing that is fully end-to-end
trainable. DSG combines a deep neural network for parsing (i) entities in
documents (e.g., figures, text blocks, headers, etc.) and (ii) relations that
capture the sequence and nested structure between entities. Unlike existing
systems that rely on heuristics, our DSG is trained end-to-end, making it
effective and flexible for real-world applications. We further contribute a
new, large-scale dataset called E-Periodica comprising real-world magazines
with complex document structures for evaluation. Our results demonstrate that
our DSG outperforms commercial OCR tools and, on top of that, achieves
state-of-the-art performance. To the best of our knowledge, our DSG system is
the first end-to-end trainable system for hierarchical document parsing.
Related papers
- HDT: Hierarchical Document Transformer [70.2271469410557]
HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy.
We develop a novel sparse attention kernel that considers the hierarchical structure of documents.
arXiv Detail & Related papers (2024-07-11T09:28:04Z) - Document Structure in Long Document Transformers [64.76981299465885]
Long documents often exhibit structure with hierarchically organized elements of different functions, such as section headers and paragraphs.
Despite the omnipresence of document structure, its role in natural language processing (NLP) remains opaque.
Do long-document Transformer models acquire an internal representation of document structure during pre-training?
How can structural information be communicated to a model after pre-training, and how does it influence downstream performance?
arXiv Detail & Related papers (2024-01-31T08:28:06Z) - Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis [9.340346869932434]
We propose a tree construction based approach that addresses multiple subtasks concurrently.
We present an effective end-to-end solution based on this framework to demonstrate its performance.
Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets.
arXiv Detail & Related papers (2024-01-22T12:00:37Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - Enhancing Performance on Seen and Unseen Dialogue Scenarios using
Retrieval-Augmented End-to-End Task-Oriented System [89.40590076430297]
This work enables the TOD systems with more flexibility through a simple cache.
We train end-to-end TOD models that can refer to and ground on both dialogue history and retrieved information during TOD generation.
Experiments demonstrate the superior performance of our framework, with a notable improvement in non-empty joint goal accuracy by 6.7% compared to strong baselines.
arXiv Detail & Related papers (2023-08-16T06:52:10Z) - HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of
Document Structures [31.868926876151342]
This paper introduces hierarchical reconstruction of document structures as a novel task suitable for NLP and CV fields.
We built a large-scale dataset named HRDoc, which consists of 2,500 multi-page documents with nearly 2 million semantic units.
We propose an encoder-decoder-based hierarchical document structure parsing system (DSPS) to tackle this problem.
arXiv Detail & Related papers (2023-03-24T07:23:56Z) - Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout
Analysis [4.920817773181236]
Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis.
We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information.
We apply graph convolutional networks for representing each aspect of information and use pooling to integrate them.
arXiv Detail & Related papers (2022-08-22T07:22:05Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Capturing Logical Structure of Visually Structured Documents with
Multimodal Transition Parser [39.75232199445175]
We propose to formulate the task as prediction of transition labels between text fragments that maps the fragments to a tree.
We developed a feature-based machine learning system that fuses visual, textual and semantic cues.
Our system obtained a paragraph boundary detection F1 score of 0.951 which is significantly better than a popular PDF-to-text tool with a F1 score of 0.739.
arXiv Detail & Related papers (2021-05-01T02:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.