End-to-End Hierarchical Relation Extraction for Generic Form
Understanding
- URL: http://arxiv.org/abs/2106.00980v1
- Date: Wed, 2 Jun 2021 06:51:35 GMT
- Title: End-to-End Hierarchical Relation Extraction for Generic Form
Understanding
- Authors: Tuan-Anh Nguyen Dang, Duc-Thanh Hoang, Quang-Bach Tran, Chih-Wei Pan,
Thanh-Dat Nguyen
- Abstract summary: We present a novel deep neural network to jointly perform both entity detection and link prediction.
Our model extends the Multi-stage Attentional U-Net architecture with the Part-Intensity Fields and Part-Association Fields for link prediction.
We demonstrate the effectiveness of the model on the Form Understanding in Noisy Scanned Documents dataset.
- Score: 0.6299766708197884
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Form understanding is a challenging problem which aims to recognize semantic
entities from the input document and their hierarchical relations. Previous
approaches face significant difficulty dealing with the complexity of the task,
thus treat these objectives separately. To this end, we present a novel deep
neural network to jointly perform both entity detection and link prediction in
an end-to-end fashion. Our model extends the Multi-stage Attentional U-Net
architecture with the Part-Intensity Fields and Part-Association Fields for
link prediction, enriching the spatial information flow with the additional
supervision from entity linking. We demonstrate the effectiveness of the model
on the Form Understanding in Noisy Scanned Documents (FUNSD) dataset, where our
method substantially outperforms the original model and state-of-the-art
baselines in both Entity Labeling and Entity Linking task.
Related papers
- A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - EnriCo: Enriched Representation and Globally Constrained Inference for Entity and Relation Extraction [3.579132482505273]
Joint entity and relation extraction plays a pivotal role in various applications, notably in the construction of knowledge graphs.
Existing approaches often fall short of two key aspects: richness of representation and coherence in output structure.
In our work, we introduce EnriCo, which mitigates these shortcomings.
arXiv Detail & Related papers (2024-04-18T20:15:48Z) - Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection.
Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly.
Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z) - CARE: Co-Attention Network for Joint Entity and Relation Extraction [0.0]
We propose a Co-Attention network for joint entity and relation extraction.
Our approach includes adopting a parallel encoding strategy to learn separate representations for each subtask.
At the core of our approach is the co-attention module that captures two-way interaction between the two subtasks.
arXiv Detail & Related papers (2023-08-24T03:40:54Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Neural Constraint Satisfaction: Hierarchical Abstraction for
Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities.
We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.
We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z) - An End-to-end Model for Entity-level Relation Extraction using
Multi-instance Learning [2.111790330664657]
We present a joint model for entity-level relation extraction from documents.
We achieve state-of-the-art relation extraction results on the DocRED dataset.
Our experimental results suggest that a joint approach is on par with task-specific learning, though more efficient due to shared parameters and training steps.
arXiv Detail & Related papers (2021-02-11T12:49:39Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.