Related papers: Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

URL: http://arxiv.org/abs/2401.11874v2
Date: Thu, 28 Mar 2024 08:40:08 GMT
Title: Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
Authors: Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo,
Abstract summary: We propose a tree construction based approach that addresses multiple subtasks concurrently. We present an effective end-to-end solution based on this framework to demonstrate its performance. Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets.
Score: 9.340346869932434
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document structure analysis (aka document layout analysis) is crucial for understanding the physical layout and logical structure of documents, with applications in information retrieval, document summarization, knowledge extraction, etc. In this paper, we concentrate on Hierarchical Document Structure Analysis (HDSA) to explore hierarchical relationships within structured documents created using authoring software employing hierarchical schemas, such as LaTeX, Microsoft Word, and HTML. To comprehensively analyze hierarchical document structures, we propose a tree construction based approach that addresses multiple subtasks concurrently, including page object detection (Detect), reading order prediction of identified objects (Order), and the construction of intended hierarchical structure (Construct). We present an effective end-to-end solution based on this framework to demonstrate its performance. To assess our approach, we develop a comprehensive benchmark called Comp-HRDoc, which evaluates the above subtasks simultaneously. Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark. The Comp-HRDoc benchmark will be released to facilitate further research in this field.

Related papers

MoDora: Tree-Based Semi-Structured Document Analysis System [62.01015188258797]
Semi-structured documents integrate diverse interleaved data elements arranged in various and often irregular layouts.<n>MoDora is an LLM-powered system for semi-structured document analysis.<n> Experiments show MoDora outperforms baselines by 5.97%-61.07% in accuracy.
arXiv Detail & Related papers (2026-02-26T14:48:49Z)
DMAP: Human-Aligned Structural Document Map for Multimodal Document Understanding [30.54420648726099]
Document-level structural Document MAP encodes both hierarchical organization and inter-element relationships within multimodal documents.<n>Building upon this representation, a Reflective Reasoning Agent performs structure-aware and evidence-driven reasoning.<n>Experiments on MMDocQA benchmarks demonstrate that DMAP yields document-specific structural representations aligned with human interpretive patterns.
arXiv Detail & Related papers (2026-01-26T06:38:25Z)
BookRAG: A Hierarchical Structure-aware Index-based Approach for Retrieval-Augmented Generation on Complex Documents [11.158307125677375]
Retrieval-Augmented Generation (RAG) queries highly relevant information from external complex documents.<n>We introduce BookRAG, a novel RAG approach targeted for documents with a hierarchical structure.<n>BookRAG achieves state-of-the-art performance, significantly outperforming baselines in both retrieval recall and QA accuracy.
arXiv Detail & Related papers (2025-12-03T03:40:49Z)
DREAM: Document Reconstruction via End-to-end Autoregressive Model [53.51754520966657]
We present an innovative autoregressive model specifically designed for document reconstruction, referred to as Document Reconstruction via End-to-end Autoregressive Model (DREAM)<n>We establish a standardized definition of the document reconstruction task, and introduce a novel Document Similarity Metric (DSM) and DocRec1K dataset for assessing the performance of the task.
arXiv Detail & Related papers (2025-07-08T09:24:07Z)
DISRetrieval: Harnessing Discourse Structure for Long Document Retrieval [51.89673002051528]
DISRetrieval is a novel hierarchical retrieval framework that leverages linguistic discourse structure to enhance long document understanding.<n>Our studies confirm that discourse structure significantly enhances retrieval effectiveness across different document lengths and query types.
arXiv Detail & Related papers (2025-05-26T14:45:12Z)
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis [7.057192434574117]
We propose a unified relation prediction approach for HDSA, called UniHDSA. UniHDSA treats various HDSA sub-tasks as relation prediction problems and consolidates relation prediction labels into a unified label space. To validate the effectiveness of UniHDSA, we develop a multimodal end-to-end system based on Transformer architectures.
arXiv Detail & Related papers (2025-03-20T06:44:47Z)
ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels. Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches. Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z)
Graph-based Document Structure Analysis [26.79096546002763]
We propose a novel graph-based Document Structure Analysis (gDSA) task. This task requires that model not only detects document elements but also generates spatial and logical relations in form of a graph structure. We construct a relation graph-based document structure analysis dataset (GraphDoc) with 80K document images and 4.13M relation annotations.
arXiv Detail & Related papers (2025-02-04T17:16:14Z)
Seg2Act: Global Context-aware Action Generation for Document Logical Structuring [45.55145491566147]
We introduce Seg2Act, an end-to-end, generation-based method for document logical structuring. Seg2Act iteratively generates the action sequence via a global context-aware generative model, and simultaneously updates its global context and current logical structure. Experiments on ChCatExt and HierDoc datasets demonstrate the superior performance of Seg2Act in both supervised and transfer learning settings.
arXiv Detail & Related papers (2024-10-09T11:58:40Z)
HDT: Hierarchical Document Transformer [70.2271469410557]
HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy. We develop a novel sparse attention kernel that considers the hierarchical structure of documents.
arXiv Detail & Related papers (2024-07-11T09:28:04Z)
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding [55.48936731641802]
We present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets. The dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese.
arXiv Detail & Related papers (2024-06-13T02:35:55Z)
DLAFormer: An End-to-End Transformer For Document Layout Analysis [7.057192434574117]
We propose an end-to-end transformer-based approach for document layout analysis, called DLAFormer. We treat various DLA sub-tasks as relation prediction problems and consolidate these relation prediction labels into a unified label space. We introduce a novel set of type-wise queries to enhance the physical meaning of content queries in DETR.
arXiv Detail & Related papers (2024-05-20T03:34:24Z)
Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection. We abstract over arbitrary header paraphrases, and ground each topic to respective document locations. We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z)
PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. We propose PDFTriage that enables models to retrieve the context based on either structure or content. Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z)
HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures [31.868926876151342]
This paper introduces hierarchical reconstruction of document structures as a novel task suitable for NLP and CV fields. We built a large-scale dataset named HRDoc, which consists of 2,500 multi-page documents with nearly 2 million semantic units. We propose an encoder-decoder-based hierarchical document structure parsing system (DSPS) to tackle this problem.
arXiv Detail & Related papers (2023-03-24T07:23:56Z)
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis [4.920817773181236]
Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information. We apply graph convolutional networks for representing each aspect of information and use pooling to integrate them.
arXiv Detail & Related papers (2022-08-22T07:22:05Z)
Cross-Domain Document Layout Analysis Using Document Style Guide [15.799572801059716]
Document layout analysis (DLA) aims to decompose document images into high-level semantic areas. Many researchers devoted this challenge by synthesizing data to build large training sets. In this paper, we propose an unsupervised cross-domain DLA framework based on document style guidance.
arXiv Detail & Related papers (2022-01-24T00:49:19Z)
Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z)
DocBank: A Benchmark Dataset for Document Layout Analysis [114.81155155508083]
We present textbfDocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents.
arXiv Detail & Related papers (2020-06-01T16:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.