RE$^2$: Region-Aware Relation Extraction from Visually Rich Documents
- URL: http://arxiv.org/abs/2305.14590v2
- Date: Tue, 4 Jun 2024 01:32:18 GMT
- Title: RE$^2$: Region-Aware Relation Extraction from Visually Rich Documents
- Authors: Pritika Ramu, Sijia Wang, Lalla Mouatadid, Joy Rimchala, Lifu Huang,
- Abstract summary: We propose REgion-Aware Relation Extraction (RE$2$) that leverages region-level spatial structure among the entity blocks to improve their relation prediction.
We also introduce a constraint objective to regularize the model towards consistency with the inherent constraints of the relation extraction task.
- Score: 18.369611871952667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current research in form understanding predominantly relies on large pre-trained language models, necessitating extensive data for pre-training. However, the importance of layout structure (i.e., the spatial relationship between the entity blocks in the visually rich document) to relation extraction has been overlooked. In this paper, we propose REgion-Aware Relation Extraction (RE$^2$) that leverages region-level spatial structure among the entity blocks to improve their relation prediction. We design an edge-aware graph attention network to learn the interaction between entities while considering their spatial relationship defined by their region-level representations. We also introduce a constraint objective to regularize the model towards consistency with the inherent constraints of the relation extraction task. Extensive experiments across various datasets, languages and domains demonstrate the superiority of our proposed approach.
Related papers
- ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension [29.50623143244436]
ReMeREC aims to localize specified entities or regions in an image based on natural language descriptions.<n>We first construct a relation-aware, multi-entity REC dataset called ReMeX.<n>We then propose ReMeREC, a novel framework that jointly leverages visual and textual cues to localize multiple entities.
arXiv Detail & Related papers (2025-07-22T11:23:48Z) - HierRelTriple: Guiding Indoor Layout Generation with Hierarchical Relationship Triplet Losses [52.70183252341687]
We present a hierarchical triplet-based indoor relationship learning method, coined HierRelTriple, with a focus on spatial relationship learning.<n>We introduce HierRelTriple, a hierarchical relational triplets modeling framework that first partitions functional regions and then automatically extracts three levels of spatial relationships.<n>Experiments on unconditional layout synthesis, floorplan-conditioned layout generation, and scene rearrangement demonstrate that HierRel improves spatial-relation metrics by over 15%.
arXiv Detail & Related papers (2025-03-26T07:31:52Z) - Non-parametric Contextual Relationship Learning for Semantic Video Object Segmentation [1.4042211166197214]
We introduce an exemplar-based non-parametric view of contextual cues, where the inherent relationships implied by object hypotheses are encoded on a similarity graph of regions.
Our algorithm integrates the learned contexts into a Conditional Random Field (CRF) in the form of pairwise potentials and infers the per-region semantic labels.
We evaluate our approach on the challenging YouTube-Objects dataset which shows that the proposed contextual relationship model outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-08T13:22:13Z) - EnriCo: Enriched Representation and Globally Constrained Inference for Entity and Relation Extraction [3.579132482505273]
Joint entity and relation extraction plays a pivotal role in various applications, notably in the construction of knowledge graphs.
Existing approaches often fall short of two key aspects: richness of representation and coherence in output structure.
In our work, we introduce EnriCo, which mitigates these shortcomings.
arXiv Detail & Related papers (2024-04-18T20:15:48Z) - Improving Vision-and-Language Reasoning via Spatial Relations Modeling [30.477235227733928]
Visual commonsense reasoning (VCR) is a challenging multi-modal task.
The proposed method can guide the representations to maintain more spatial context.
We achieve the state-of-the-art results on VCR and two other vision-and-language reasoning tasks VQA, and NLVR.
arXiv Detail & Related papers (2023-11-09T11:54:55Z) - Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction [121.65152276851619]
We show that semantic correlations between relations are inherently edge-level and entity-independent.
We propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations.
To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph.
arXiv Detail & Related papers (2023-09-20T08:11:58Z) - Leveraging Knowledge Graph Embeddings to Enhance Contextual
Representations for Relation Extraction [0.0]
We propose a relation extraction approach based on the incorporation of pretrained knowledge graph embeddings at the corpus scale into the sentence-level contextual representation.
We conducted a series of experiments which revealed promising and very interesting results for our proposed approach.
arXiv Detail & Related papers (2023-06-07T07:15:20Z) - Message Intercommunication for Inductive Relation Reasoning [49.731293143079455]
We develop a novel inductive relation reasoning model called MINES.
We introduce a Message Intercommunication mechanism on the Neighbor-Enhanced Subgraph.
Our experiments show that MINES outperforms existing state-of-the-art models.
arXiv Detail & Related papers (2023-05-23T13:51:46Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - End-to-End Hierarchical Relation Extraction for Generic Form
Understanding [0.6299766708197884]
We present a novel deep neural network to jointly perform both entity detection and link prediction.
Our model extends the Multi-stage Attentional U-Net architecture with the Part-Intensity Fields and Part-Association Fields for link prediction.
We demonstrate the effectiveness of the model on the Form Understanding in Noisy Scanned Documents dataset.
arXiv Detail & Related papers (2021-06-02T06:51:35Z) - Learning Relation Prototype from Unlabeled Texts for Long-tail Relation
Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts.
We learn relation prototypes as an implicit factor between entities.
We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z) - Understanding Spatial Relations through Multiple Modalities [78.07328342973611]
spatial relations between objects can either be explicit -- expressed as spatial prepositions, or implicit -- expressed by spatial verbs such as moving, walking, shifting, etc.
We introduce the task of inferring implicit and explicit spatial relations between two entities in an image.
We design a model that uses both textual and visual information to predict the spatial relations, making use of both positional and size information of objects and image embeddings.
arXiv Detail & Related papers (2020-07-19T01:35:08Z) - Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.