Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
- URL: http://arxiv.org/abs/2403.15765v1
- Date: Sat, 23 Mar 2024 08:40:35 GMT
- Title: Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
- Authors: Hao Wang, Tang Li, Chenhui Chu, Nengjun Zhu, Rui Wang, Pinpin Zhu,
- Abstract summary: Key-value relations are prevalent in Visually-Rich Documents (VRDs)
These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets.
Our research focuses on few-shot relational learning, specifically targeting the extraction of key-value relation triplets in VRDs.
- Score: 16.78371134590167
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related to visual and spatial features, resulting in suboptimal performance, particularly when dealing with limited examples. To address this limitation, our research focuses on few-shot relational learning, specifically targeting the extraction of key-value relation triplets in VRDs. Given the absence of a suitable dataset for this task, we introduce two new few-shot benchmarks built upon existing supervised benchmark datasets. Furthermore, we propose a variational approach that incorporates relational 2D-spatial priors and prototypical rectification techniques. This approach aims to generate relation representations that are more aware of the spatial context and unseen relation in a manner similar to human perception. Experimental results demonstrate the effectiveness of our proposed method by showcasing its ability to outperform existing methods. This study also opens up new possibilities for practical applications.
Related papers
- RelVAE: Generative Pretraining for few-shot Visual Relationship
Detection [2.2230760534775915]
We present the first pretraining method for few-shot predicate classification that does not require any annotated relations.
We construct few-shot training splits and show quantitative experiments on VG200 and VRD datasets.
arXiv Detail & Related papers (2023-11-27T19:08:08Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Knowledge-augmented Few-shot Visual Relation Detection [25.457693302327637]
Visual Relation Detection (VRD) aims to detect relationships between objects for image understanding.
Most existing VRD methods rely on thousands of training samples of each relationship to achieve satisfactory performance.
We devise a knowledge-augmented, few-shot VRD framework leveraging both textual knowledge and visual relation knowledge.
arXiv Detail & Related papers (2023-03-09T15:38:40Z) - Cross-Domain Few-Shot Relation Extraction via Representation Learning
and Domain Adaptation [1.1602089225841632]
Few-shot relation extraction aims to recognize novel relations with few labeled sentences in each relation.
Previous metric-based few-shot relation extraction algorithms identify relationships by comparing the prototypes generated by the few labeled sentences embedding with the embeddings of the query sentences using a trained metric function.
We suggest learning more interpretable and efficient prototypes from prior knowledge and the intrinsic semantics of relations to extract new relations in various domains more effectively.
arXiv Detail & Related papers (2022-12-05T19:34:52Z) - RLIP: Relational Language-Image Pre-training for Human-Object
Interaction Detection [32.20132357830726]
Language-Image Pre-training (LIPR) is a strategy for contrastive pre-training that leverages both entity and relation descriptions.
We show the benefits of these contributions, collectively termed RLIP-ParSe, for improved zero-shot, few-shot and fine-tuning HOI detection as well as increased robustness from noisy annotations.
arXiv Detail & Related papers (2022-09-05T07:50:54Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense
Passage Retrieval [87.68667887072324]
We propose a novel approach that leverages query-centric and PAssage-centric sImilarity Relations (called PAIR) for dense passage retrieval.
To implement our approach, we make three major technical contributions by introducing formal formulations of the two kinds of similarity relations.
Our approach significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions datasets.
arXiv Detail & Related papers (2021-08-13T02:07:43Z) - Learning Relation Prototype from Unlabeled Texts for Long-tail Relation
Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts.
We learn relation prototypes as an implicit factor between entities.
We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z) - One-shot Learning for Temporal Knowledge Graphs [49.41854171118697]
We propose a one-shot learning framework for link prediction in temporal knowledge graphs.
Our proposed method employs a self-attention mechanism to effectively encode temporal interactions between entities.
Our experiments show that the proposed algorithm outperforms the state of the art baselines for two well-studied benchmarks.
arXiv Detail & Related papers (2020-10-23T03:24:44Z) - Detecting Human-Object Interactions with Action Co-occurrence Priors [108.31956827512376]
A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially in rare classes.
arXiv Detail & Related papers (2020-07-17T02:47:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.