Unimodal and Multimodal Representation Training for Relation Extraction
- URL: http://arxiv.org/abs/2211.06168v1
- Date: Fri, 11 Nov 2022 12:39:35 GMT
- Title: Unimodal and Multimodal Representation Training for Relation Extraction
- Authors: Ciaran Cooney, Rachel Heyburn, Liam Maddigan, Mairead O'Cuinn, Chloe
Thompson and Joana Cavadas
- Abstract summary: Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE)
Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training.
While a bimodal text and layout approach performs best, we show that text is the most important single predictor of entity relations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal integration of text, layout and visual information has achieved
SOTA results in visually rich document understanding (VrDU) tasks, including
relation extraction (RE). However, despite its importance, evaluation of the
relative predictive capacity of these modalities is less prevalent. Here, we
demonstrate the value of shared representations for RE tasks by conducting
experiments in which each data type is iteratively excluded during training. In
addition, text and layout data are evaluated in isolation. While a bimodal text
and layout approach performs best (F1=0.684), we show that text is the most
important single predictor of entity relations. Additionally, layout geometry
is highly predictive and may even be a feasible unimodal approach. Despite
being less effective, we highlight circumstances where visual information can
bolster performance. In total, our results demonstrate the efficacy of training
joint representations for RE.
Related papers
- A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents [0.0]
We present a model that can match or outperform the current state-of-the-art results in Relation Extraction (RE) applied to Visually-Rich Documents (VRD)
We also report an extensive ablation study performed on FUNSD, highlighting the great impact of certain features and modelization choices on the performances.
arXiv Detail & Related papers (2024-04-16T18:50:57Z) - Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Leveraging Knowledge Graph Embeddings to Enhance Contextual
Representations for Relation Extraction [0.0]
We propose a relation extraction approach based on the incorporation of pretrained knowledge graph embeddings at the corpus scale into the sentence-level contextual representation.
We conducted a series of experiments which revealed promising and very interesting results for our proposed approach.
arXiv Detail & Related papers (2023-06-07T07:15:20Z) - Vision-Language Pre-Training with Triple Contrastive Learning [45.80365827890119]
We propose triple contrastive learning (TCL) for vision-language pre-training by leveraging both cross-modal and intra-modal self-supervision.
Ours is the first work that takes into account local structure information for multi-modality representation learning.
arXiv Detail & Related papers (2022-02-21T17:54:57Z) - Efficient Multi-Modal Embeddings from Structured Data [0.0]
Multi-modal word semantics aims to enhance embeddings with perceptual input.
Visual grounding can contribute to linguistic applications as well.
New embedding conveys complementary information for text based embeddings.
arXiv Detail & Related papers (2021-10-06T08:42:09Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.