DocTr: Document Transformer for Structured Information Extraction in
Documents
- URL: http://arxiv.org/abs/2307.07929v1
- Date: Sun, 16 Jul 2023 02:59:30 GMT
- Title: DocTr: Document Transformer for Structured Information Extraction in
Documents
- Authors: Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting
Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan
- Abstract summary: We present a new formulation for structured information extraction from visually rich documents.
It aims to address the limitations of existing IOB tagging or graph-based formulations.
We represent an entity as an anchor word and a bounding box, and represent entity linking as the association between anchor words.
- Score: 36.1145541816468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new formulation for structured information extraction (SIE) from
visually rich documents. It aims to address the limitations of existing IOB
tagging or graph-based formulations, which are either overly reliant on the
correct ordering of input text or struggle with decoding a complex graph.
Instead, motivated by anchor-based object detectors in vision, we represent an
entity as an anchor word and a bounding box, and represent entity linking as
the association between anchor words. This is more robust to text ordering, and
maintains a compact graph for entity linking. The formulation motivates us to
introduce 1) a DOCument TRansformer (DocTr) that aims at detecting and
associating entity bounding boxes in visually rich documents, and 2) a simple
pre-training strategy that helps learn entity detection in the context of
language. Evaluations on three SIE benchmarks show the effectiveness of the
proposed formulation, and the overall approach outperforms existing solutions.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding [55.48936731641802]
We present the SRFUND, a hierarchically structured multi-task form understanding benchmark.
SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets.
The dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese.
arXiv Detail & Related papers (2024-06-13T02:35:55Z) - A Semantic Mention Graph Augmented Model for Document-Level Event Argument Extraction [12.286432133599355]
Document-level Event Argument Extraction (DEAE) aims to identify arguments and their specific roles from an unstructured document.
advanced approaches on DEAE utilize prompt-based methods to guide pre-trained language models (PLMs) in extracting arguments from input documents.
We propose a semantic mention Graph Augmented Model (GAM) to address these two problems in this paper.
arXiv Detail & Related papers (2024-03-12T08:58:07Z) - Document-level Relation Extraction with Cross-sentence Reasoning Graph [14.106582119686635]
Relation extraction (RE) has recently moved from the sentence-level to document-level.
We propose a novel document-level RE model with a GRaph information Aggregation and Cross-sentence Reasoning network (GRACR)
Experimental results show GRACR achieves excellent performance on two public datasets of document-level RE.
arXiv Detail & Related papers (2023-03-07T14:14:12Z) - Not Just Plain Text! Fuel Document-Level Relation Extraction with
Explicit Syntax Refinement and Subsentence Modeling [3.9436257406798925]
We propose expLicit syntAx Refinement and Subsentence mOdeliNg based framework (LARSON)
By introducing extra syntactic information, LARSON can model subsentences of arbitrary granularity and efficiently screen instructive ones.
Experimental results on three benchmark datasets (DocRED, CDR, and GDA) demonstrate that LARSON significantly outperforms existing methods.
arXiv Detail & Related papers (2022-11-10T05:06:37Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - StrucTexT: Structured Text Understanding with Multi-Modal Transformers [29.540122964399046]
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence.
This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks.
We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts.
arXiv Detail & Related papers (2021-08-06T02:57:07Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.