Related papers: End-to-End Hierarchical Relation Extraction for Generic Form Understanding

End-to-End Hierarchical Relation Extraction for Generic Form Understanding

URL: http://arxiv.org/abs/2106.00980v1
Date: Wed, 2 Jun 2021 06:51:35 GMT
Title: End-to-End Hierarchical Relation Extraction for Generic Form Understanding
Authors: Tuan-Anh Nguyen Dang, Duc-Thanh Hoang, Quang-Bach Tran, Chih-Wei Pan, Thanh-Dat Nguyen
Abstract summary: We present a novel deep neural network to jointly perform both entity detection and link prediction. Our model extends the Multi-stage Attentional U-Net architecture with the Part-Intensity Fields and Part-Association Fields for link prediction. We demonstrate the effectiveness of the model on the Form Understanding in Noisy Scanned Documents dataset.
Score: 0.6299766708197884
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Form understanding is a challenging problem which aims to recognize semantic entities from the input document and their hierarchical relations. Previous approaches face significant difficulty dealing with the complexity of the task, thus treat these objectives separately. To this end, we present a novel deep neural network to jointly perform both entity detection and link prediction in an end-to-end fashion. Our model extends the Multi-stage Attentional U-Net architecture with the Part-Intensity Fields and Part-Association Fields for link prediction, enriching the spatial information flow with the additional supervision from entity linking. We demonstrate the effectiveness of the model on the Form Understanding in Noisy Scanned Documents (FUNSD) dataset, where our method substantially outperforms the original model and state-of-the-art baselines in both Entity Labeling and Entity Linking task.

Related papers

Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
Interpretable deformable image registration: A geometric deep learning perspective [9.13809412085203]
We present a theoretical foundation for designing an interpretable registration framework. We formulate an end-to-end process that refines transformations in a coarse-to-fine fashion. We conclude by showing significant improvement in performance metrics over state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-17T19:47:10Z)
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA) CEFA consists of a feature alignment module and a context enhancement module. Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z)
EnriCo: Enriched Representation and Globally Constrained Inference for Entity and Relation Extraction [3.579132482505273]
Joint entity and relation extraction plays a pivotal role in various applications, notably in the construction of knowledge graphs. Existing approaches often fall short of two key aspects: richness of representation and coherence in output structure. In our work, we introduce EnriCo, which mitigates these shortcomings.
arXiv Detail & Related papers (2024-04-18T20:15:48Z)
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection. Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly. Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z)
CARE: Co-Attention Network for Joint Entity and Relation Extraction [0.0]
We propose a Co-Attention network for joint entity and relation extraction. Our approach includes adopting a parallel encoding strategy to learn separate representations for each subtask. At the core of our approach is the co-attention module that captures two-way interaction between the two subtasks.
arXiv Detail & Related papers (2023-08-24T03:40:54Z)
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images. We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities. The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z)
Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities. We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z)
An End-to-end Model for Entity-level Relation Extraction using Multi-instance Learning [2.111790330664657]
We present a joint model for entity-level relation extraction from documents. We achieve state-of-the-art relation extraction results on the DocRED dataset. Our experimental results suggest that a joint approach is on par with task-specific learning, though more efficient due to shared parameters and training steps.
arXiv Detail & Related papers (2021-02-11T12:49:39Z)
CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images. One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships. We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z)
Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities. We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities. Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.