GDA: Generative Data Augmentation Techniques for Relation Extraction
Tasks
- URL: http://arxiv.org/abs/2305.16663v2
- Date: Thu, 15 Jun 2023 02:43:12 GMT
- Title: GDA: Generative Data Augmentation Techniques for Relation Extraction
Tasks
- Authors: Xuming Hu, Aiwei Liu, Zeqi Tan, Xin Zhang, Chenwei Zhang, Irwin King,
Philip S. Yu
- Abstract summary: We propose a dedicated augmentation technique for relational texts, named GDA, which uses two complementary modules to preserve both semantic consistency and syntax structures.
Experimental results in three datasets under a low-resource setting showed that GDA could bring em 2.0% F1 improvements compared with no augmentation technique.
- Score: 81.51314139202152
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Relation extraction (RE) tasks show promising performance in extracting
relations from two entities mentioned in sentences, given sufficient
annotations available during training. Such annotations would be
labor-intensive to obtain in practice. Existing work adopts data augmentation
techniques to generate pseudo-annotated sentences beyond limited annotations.
These techniques neither preserve the semantic consistency of the original
sentences when rule-based augmentations are adopted, nor preserve the syntax
structure of sentences when expressing relations using seq2seq models,
resulting in less diverse augmentations. In this work, we propose a dedicated
augmentation technique for relational texts, named GDA, which uses two
complementary modules to preserve both semantic consistency and syntax
structures. We adopt a generative formulation and design a multi-tasking
solution to achieve synergies. Furthermore, GDA adopts entity hints as the
prior knowledge of the generative model to augment diverse sentences.
Experimental results in three datasets under a low-resource setting showed that
GDA could bring {\em 2.0\%} F1 improvements compared with no augmentation
technique. Source code and data are available.
Related papers
- GASE: Generatively Augmented Sentence Encoding [0.0]
We propose an approach to enhance sentence embeddings by applying generative text models for data augmentation at inference time.
Generatively Augmented Sentence uses diverse synthetic variants of input texts generated by paraphrasing, summarising or extracting keywords.
We find that generative augmentation leads to larger performance improvements for embedding models with lower baseline performance.
arXiv Detail & Related papers (2024-11-07T17:53:47Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Semi-automatic Data Enhancement for Document-Level Relation Extraction
with Distant Supervision from Large Language Models [26.523153535336725]
Document-level Relation Extraction (DocRE) aims to extract relations from a long context.
We propose a method integrating a large language model (LLM) and a natural language inference (NLI) module to generate relation triples.
We demonstrate the effectiveness of our approach by introducing an enhanced dataset known as DocGNRE.
arXiv Detail & Related papers (2023-11-13T13:10:44Z) - Distributional Data Augmentation Methods for Low Resource Language [0.9208007322096533]
Easy data augmentation (EDA) augments the training data by injecting and replacing synonyms and randomly permuting sentences.
One major obstacle with EDA is the need for versatile and complete synonym dictionaries, which cannot be easily found in low-resource languages.
We propose two extensions, easy distributional data augmentation (EDDA) and type specific similar word replacement (TSSR), which uses semantic word context information and part-of-speech tags for word replacement and augmentation.
arXiv Detail & Related papers (2023-09-09T19:01:59Z) - Conjunct Resolution in the Face of Verbal Omissions [51.220650412095665]
We propose a conjunct resolution task that operates directly on the text and makes use of a split-and-rephrase paradigm in order to recover the missing elements in the coordination structure.
We curate a large dataset, containing over 10K examples of naturally-occurring verbal omissions with crowd-sourced annotations.
We train various neural baselines for this task, and show that while our best method obtains decent performance, it leaves ample space for improvement.
arXiv Detail & Related papers (2023-05-26T08:44:02Z) - Entity-to-Text based Data Augmentation for various Named Entity
Recognition Tasks [96.52649319569535]
We propose a novel Entity-to-Text based data augmentation technique named EnTDA.
We introduce a diversity beam search to increase the diversity during the text generation process.
arXiv Detail & Related papers (2022-10-19T07:24:40Z) - Entity Aware Syntax Tree Based Data Augmentation for Natural Language
Understanding [5.02493891738617]
We propose a novel NLP data augmentation technique, which applies a tree structure, Entity Aware Syntax Tree (EAST) to represent sentences combined with attention on the entity.
Our EADA technique automatically constructs an EAST from a small amount of annotated data, and then generates a large number of training instances for intent detection and slot filling.
Experimental results on four datasets showed that the proposed technique significantly outperforms the existing data augmentation methods in terms of both accuracy and generalization ability.
arXiv Detail & Related papers (2022-09-06T07:34:10Z) - SUBS: Subtree Substitution for Compositional Semantic Parsing [50.63574492655072]
We propose to use subtree substitution for compositional data augmentation, where we consider subtrees with similar semantic functions as exchangeable.
Experiments showed that such augmented data led to significantly better performance on SCAN and GeoQuery, and reached new SOTA on compositional split of GeoQuery.
arXiv Detail & Related papers (2022-05-03T14:47:35Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Entity and Evidence Guided Relation Extraction for DocRED [33.69481141963074]
We pro-pose a joint training frameworkE2GRE(Entity and Evidence Guided Relation Extraction)for this task.
We introduce entity-guided sequences as inputs to a pre-trained language model (e.g. BERT, RoBERTa)
These entity-guided sequences help a pre-trained language model (LM) to focus on areas of the document related to the entity.
We evaluate our E2GRE approach on DocRED, a recently released large-scale dataset for relation extraction.
arXiv Detail & Related papers (2020-08-27T17:41:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.