Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
- URL: http://arxiv.org/abs/2401.11874v2
- Date: Thu, 28 Mar 2024 08:40:08 GMT
- Title: Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
- Authors: Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo,
- Abstract summary: We propose a tree construction based approach that addresses multiple subtasks concurrently.
We present an effective end-to-end solution based on this framework to demonstrate its performance.
Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets.
- Score: 9.340346869932434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document structure analysis (aka document layout analysis) is crucial for understanding the physical layout and logical structure of documents, with applications in information retrieval, document summarization, knowledge extraction, etc. In this paper, we concentrate on Hierarchical Document Structure Analysis (HDSA) to explore hierarchical relationships within structured documents created using authoring software employing hierarchical schemas, such as LaTeX, Microsoft Word, and HTML. To comprehensively analyze hierarchical document structures, we propose a tree construction based approach that addresses multiple subtasks concurrently, including page object detection (Detect), reading order prediction of identified objects (Order), and the construction of intended hierarchical structure (Construct). We present an effective end-to-end solution based on this framework to demonstrate its performance. To assess our approach, we develop a comprehensive benchmark called Comp-HRDoc, which evaluates the above subtasks simultaneously. Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark. The Comp-HRDoc benchmark will be released to facilitate further research in this field.
Related papers
- Seg2Act: Global Context-aware Action Generation for Document Logical Structuring [45.55145491566147]
We introduce Seg2Act, an end-to-end, generation-based method for document logical structuring.
Seg2Act iteratively generates the action sequence via a global context-aware generative model, and simultaneously updates its global context and current logical structure.
Experiments on ChCatExt and HierDoc datasets demonstrate the superior performance of Seg2Act in both supervised and transfer learning settings.
arXiv Detail & Related papers (2024-10-09T11:58:40Z) - HDT: Hierarchical Document Transformer [70.2271469410557]
HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy.
We develop a novel sparse attention kernel that considers the hierarchical structure of documents.
arXiv Detail & Related papers (2024-07-11T09:28:04Z) - SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding [55.48936731641802]
We present the SRFUND, a hierarchically structured multi-task form understanding benchmark.
SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets.
The dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese.
arXiv Detail & Related papers (2024-06-13T02:35:55Z) - DLAFormer: An End-to-End Transformer For Document Layout Analysis [7.057192434574117]
We propose an end-to-end transformer-based approach for document layout analysis, called DLAFormer.
We treat various DLA sub-tasks as relation prediction problems and consolidate these relation prediction labels into a unified label space.
We introduce a novel set of type-wise queries to enhance the physical meaning of content queries in DETR.
arXiv Detail & Related papers (2024-05-20T03:34:24Z) - Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection.
We abstract over arbitrary header paraphrases, and ground each topic to respective document locations.
We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of
Document Structures [31.868926876151342]
This paper introduces hierarchical reconstruction of document structures as a novel task suitable for NLP and CV fields.
We built a large-scale dataset named HRDoc, which consists of 2,500 multi-page documents with nearly 2 million semantic units.
We propose an encoder-decoder-based hierarchical document structure parsing system (DSPS) to tackle this problem.
arXiv Detail & Related papers (2023-03-24T07:23:56Z) - Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout
Analysis [4.920817773181236]
Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis.
We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information.
We apply graph convolutional networks for representing each aspect of information and use pooling to integrate them.
arXiv Detail & Related papers (2022-08-22T07:22:05Z) - Cross-Domain Document Layout Analysis Using Document Style Guide [15.799572801059716]
Document layout analysis (DLA) aims to decompose document images into high-level semantic areas.
Many researchers devoted this challenge by synthesizing data to build large training sets.
In this paper, we propose an unsupervised cross-domain DLA framework based on document style guidance.
arXiv Detail & Related papers (2022-01-24T00:49:19Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - DocBank: A Benchmark Dataset for Document Layout Analysis [114.81155155508083]
We present textbfDocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis.
Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents.
arXiv Detail & Related papers (2020-06-01T16:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.