Related papers: Spatial Dependency Parsing for Semi-Structured Document Information Extraction

Spatial Dependency Parsing for Semi-Structured Document Information Extraction

URL: http://arxiv.org/abs/2005.00642v3
Date: Thu, 1 Jul 2021 08:32:15 GMT
Title: Spatial Dependency Parsing for Semi-Structured Document Information Extraction
Authors: Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Sohee Yang, Minjoon Seo
Abstract summary: We propose SPADE (SPA DEpendency) that models highly complex relationships and an arbitrary number of information layers in the documents in an end-to-end manner. We evaluate it on various kinds of documents such as receipts, name cards, forms, and invoices.
Score: 29.231908055394808
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Information Extraction (IE) for semi-structured document images is often approached as a sequence tagging problem by classifying each recognized input token into one of the IOB (Inside, Outside, and Beginning) categories. However, such problem setup has two inherent limitations that (1) it cannot easily handle complex spatial relationships and (2) it is not suitable for highly structured information, which are nevertheless frequently observed in real-world document images. To tackle these issues, we first formulate the IE task as spatial dependency parsing problem that focuses on the relationship among text tokens in the documents. Under this setup, we then propose SPADE (SPAtial DEpendency parser) that models highly complex spatial relationships and an arbitrary number of information layers in the documents in an end-to-end manner. We evaluate it on various kinds of documents such as receipts, name cards, forms, and invoices, and show that it achieves a similar or better performance compared to strong baselines including BERT-based IOB taggger.

Related papers

MoDora: Tree-Based Semi-Structured Document Analysis System [62.01015188258797]
Semi-structured documents integrate diverse interleaved data elements arranged in various and often irregular layouts.<n>MoDora is an LLM-powered system for semi-structured document analysis.<n> Experiments show MoDora outperforms baselines by 5.97%-61.07% in accuracy.
arXiv Detail & Related papers (2026-02-26T14:48:49Z)
DocSplit: A Comprehensive Benchmark Dataset and Evaluation Approach for Document Packet Recognition and Splitting [3.657237256134889]
Document understanding in real-world applications often requires processing heterogeneous, multi-page document packets containing multiple documents stitched together.<n>We present the first comprehensive benchmark dataset, DocSplit, along with novel evaluation metrics for assessing the document packet splitting capabilities of large language models.<n>The benchmark addresses real-world challenges, including out-of-order pages, interleaved documents, and documents lacking clear demarcations.
arXiv Detail & Related papers (2026-02-17T19:17:55Z)
MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns [80.05126590825121]
MonkeyOCR v1.5 is a unified vision-language framework that enhances both layout understanding and content recognition.<n>To address complex table structures, we propose a visual consistency-based reinforcement learning scheme.<n>Two specialized modules, Image-Decoupled Table Parsing and Type-Guided Table Merging, are introduced to enable reliable parsing of tables.
arXiv Detail & Related papers (2025-11-13T15:12:17Z)
Benchmarking Multimodal Understanding and Complex Reasoning for ESG Tasks [56.350173737493215]
Environmental, Social, and Governance (ESG) reports are essential for evaluating sustainability practices, ensuring regulatory compliance, and promoting financial transparency.<n>MMESGBench is a first-of-its-kind benchmark dataset to evaluate multimodal understanding and complex reasoning across structurally diverse and multi-source ESG documents.<n>MMESGBench comprises 933 validated QA pairs derived from 45 ESG documents, spanning across seven distinct document types and three major ESG source categories.
arXiv Detail & Related papers (2025-07-25T03:58:07Z)
Relation-Rich Visual Document Generator for Visual Information Extraction [12.4941229258054]
We propose a Relation-rIch visual Document GEnerator (RIDGE) that addresses these limitations through a two-stage approach. Our method significantly enhances the performance of document understanding models on various VIE benchmarks.
arXiv Detail & Related papers (2025-04-14T19:19:26Z)
S2 Chunking: A Hybrid Framework for Document Segmentation Through Integrated Spatial and Semantic Analysis [0.0]
Document chunking is a critical task in natural language processing (NLP) This paper introduces a novel hybrid approach that combines layout structure, semantic analysis, and spatial relationships. Experimental results demonstrate that this approach outperforms traditional methods.
arXiv Detail & Related papers (2025-01-08T09:06:29Z)
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations [2.9798896492745537]
We present a unified dataset for document Question-Answering (QA) We reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task. On the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box.
arXiv Detail & Related papers (2025-01-06T21:46:22Z)
Multi-Field Adaptive Retrieval [39.38972160512916]
We introduce Multi-Field Adaptive Retrieval (MFAR), a flexible framework that accommodates any number of document indices on structured data. Our framework consists of two main steps: (1) the decomposition of an existing document into fields, each indexed independently through dense and lexical methods, and (2) learning a model which adaptively predicts the importance of a field by conditioning on the document query. We find that our approach allows for the optimized use of dense versus lexical representations across field types, significantly improves in document ranking over a number of existing retrievers, and achieves state-of-the-art performance for multi-field structured
arXiv Detail & Related papers (2024-10-26T03:07:22Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$) GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z)
Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection. We abstract over arbitrary header paraphrases, and ground each topic to respective document locations. We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z)
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications. FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER. We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z)
PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. We propose PDFTriage that enables models to retrieve the context based on either structure or content. Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z)
SPM: Structured Pretraining and Matching Architectures for Relevance Modeling in Meituan Search [12.244685291395093]
In e-commerce search, relevance between query and documents is an essential requirement for satisfying user experience. We propose a novel two-stage pretraining and matching architecture for relevance matching with rich structured documents. The model has already been deployed online, serving the search traffic of Meituan for over a year.
arXiv Detail & Related papers (2023-08-15T11:45:34Z)
Cross-Modal Entity Matching for Visually Rich Documents [4.8119678510491815]
Visually rich documents utilize visual cues to augment their semantics. Existing works that enable structured querying on these documents do not take this into account. We propose Juno -- a cross-modal entity matching framework to address this limitation.
arXiv Detail & Related papers (2023-03-01T18:26:14Z)
SciREX: A Challenge Dataset for Document-Level Information Extraction [56.83748634747753]
It is challenging to create a large-scale information extraction dataset at the document level. We introduce SciREX, a document level IE dataset that encompasses multiple IE tasks. We develop a neural model as a strong baseline that extends previous state-of-the-art IE models to document-level IE.
arXiv Detail & Related papers (2020-05-01T17:30:10Z)
Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout [5.8530995077744645]
We introduce a new task (named Kleister) with two new datasets. An NLP system must find the most important information, about various types of entities, in long formal documents. We propose Pipeline method as a text-only baseline with different Named Entity Recognition architectures.
arXiv Detail & Related papers (2020-03-04T22:45:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.