Entity-to-Text based Data Augmentation for various Named Entity
Recognition Tasks
- URL: http://arxiv.org/abs/2210.10343v2
- Date: Fri, 26 May 2023 16:14:43 GMT
- Title: Entity-to-Text based Data Augmentation for various Named Entity
Recognition Tasks
- Authors: Xuming Hu, Yong Jiang, Aiwei Liu, Zhongqiang Huang, Pengjun Xie, Fei
Huang, Lijie Wen, Philip S. Yu
- Abstract summary: We propose a novel Entity-to-Text based data augmentation technique named EnTDA.
We introduce a diversity beam search to increase the diversity during the text generation process.
- Score: 96.52649319569535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation techniques have been used to alleviate the problem of
scarce labeled data in various NER tasks (flat, nested, and discontinuous NER
tasks). Existing augmentation techniques either manipulate the words in the
original text that break the semantic coherence of the text, or exploit
generative models that ignore preserving entities in the original text, which
impedes the use of augmentation techniques on nested and discontinuous NER
tasks. In this work, we propose a novel Entity-to-Text based data augmentation
technique named EnTDA to add, delete, replace or swap entities in the entity
list of the original texts, and adopt these augmented entity lists to generate
semantically coherent and entity preserving texts for various NER tasks.
Furthermore, we introduce a diversity beam search to increase the diversity
during the text generation process. Experiments on thirteen NER datasets across
three tasks (flat, nested, and discontinuous NER tasks) and two settings (full
data and low resource settings) show that EnTDA could bring more performance
improvements compared to the baseline augmentation techniques.
Related papers
- Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - Hypertext Entity Extraction in Webpage [112.56734676713721]
We introduce a textbfMoE-based textbfEntity textbfExtraction textbfFramework (textitMoEEF), which integrates multiple features to enhance model performance.
We also analyze the effectiveness of hypertext features in textitHEED and several model components in textitMoEEF.
arXiv Detail & Related papers (2024-03-04T03:21:40Z) - LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named
Entity Recognition [67.96794382040547]
$LLM-DA$ is a novel data augmentation technique based on large language models (LLMs) for the few-shot NER task.
Our approach involves employing 14 contextual rewriting strategies, designing entity replacements of the same type, and incorporating noise injection to enhance robustness.
arXiv Detail & Related papers (2024-02-22T14:19:56Z) - GDA: Generative Data Augmentation Techniques for Relation Extraction
Tasks [81.51314139202152]
We propose a dedicated augmentation technique for relational texts, named GDA, which uses two complementary modules to preserve both semantic consistency and syntax structures.
Experimental results in three datasets under a low-resource setting showed that GDA could bring em 2.0% F1 improvements compared with no augmentation technique.
arXiv Detail & Related papers (2023-05-26T06:21:01Z) - Entity Aware Syntax Tree Based Data Augmentation for Natural Language
Understanding [5.02493891738617]
We propose a novel NLP data augmentation technique, which applies a tree structure, Entity Aware Syntax Tree (EAST) to represent sentences combined with attention on the entity.
Our EADA technique automatically constructs an EAST from a small amount of annotated data, and then generates a large number of training instances for intent detection and slot filling.
Experimental results on four datasets showed that the proposed technique significantly outperforms the existing data augmentation methods in terms of both accuracy and generalization ability.
arXiv Detail & Related papers (2022-09-06T07:34:10Z) - Composable Text Controls in Latent Space with ODEs [97.12426987887021]
This paper proposes a new efficient approach for composable text operations in the compact latent space of text.
By connecting pretrained LMs to the latent space through efficient adaption, we then decode the sampled vectors into desired text sequences.
Experiments show that composing those operators within our approach manages to generate or edit high-quality text.
arXiv Detail & Related papers (2022-08-01T06:51:45Z) - Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations [12.394777121890925]
This paper revisits and substantially extends previous dataset creation efforts.
We show that our extended version uses more representative texts for multi-document tasks and provides a larger and more diverse training set.
arXiv Detail & Related papers (2021-10-09T09:15:05Z) - StrucTexT: Structured Text Understanding with Multi-Modal Transformers [29.540122964399046]
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence.
This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks.
We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts.
arXiv Detail & Related papers (2021-08-06T02:57:07Z) - Hybrid Attention-Based Transformer Block Model for Distant Supervision
Relation Extraction [20.644215991166902]
We propose a new framework using hybrid attention-based Transformer block with multi-instance learning to perform the DSRE task.
The proposed approach can outperform the state-of-the-art algorithms on the evaluation dataset.
arXiv Detail & Related papers (2020-03-10T13:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.