Adversarial Word Dilution as Text Data Augmentation in Low-Resource
Regime
- URL: http://arxiv.org/abs/2305.09287v2
- Date: Wed, 9 Aug 2023 10:45:52 GMT
- Title: Adversarial Word Dilution as Text Data Augmentation in Low-Resource
Regime
- Authors: Junfan Chen, Richong Zhang, Zheyan Luo, Chunming Hu, Yongyi Mao
- Abstract summary: This paper proposes an Adversarial Word Dilution (AWD) method that can generate hard positive examples as text data augmentations.
Our idea of augmenting the text data is to dilute the embedding of strong positive words by weighted mixing with unknown-word embedding.
Empirical studies on three benchmark datasets show that AWD can generate more effective data augmentations and outperform the state-of-the-art text data augmentation methods.
- Score: 35.95241861664597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation is widely used in text classification, especially in the
low-resource regime where a few examples for each class are available during
training. Despite the success, generating data augmentations as hard positive
examples that may increase their effectiveness is under-explored. This paper
proposes an Adversarial Word Dilution (AWD) method that can generate hard
positive examples as text data augmentations to train the low-resource text
classification model efficiently. Our idea of augmenting the text data is to
dilute the embedding of strong positive words by weighted mixing with
unknown-word embedding, making the augmented inputs hard to be recognized as
positive by the classification model. We adversarially learn the dilution
weights through a constrained min-max optimization process with the guidance of
the labels. Empirical studies on three benchmark datasets show that AWD can
generate more effective data augmentations and outperform the state-of-the-art
text data augmentation methods. The additional analysis demonstrates that the
data augmentations generated by AWD are interpretable and can flexibly extend
to new examples without further training.
Related papers
- Evaluating the Effectiveness of Data Augmentation for Emotion Classification in Low-Resource Settings [1.387446067205368]
We evaluated the effectiveness of different data augmentation techniques for a multi-label emotion classification task using a low-resource dataset.
Back Translation outperformed autoencoder-based approaches and that generating multiple examples per training instance led to further performance improvement.
arXiv Detail & Related papers (2024-06-07T18:13:27Z) - Distributional Data Augmentation Methods for Low Resource Language [0.9208007322096533]
Easy data augmentation (EDA) augments the training data by injecting and replacing synonyms and randomly permuting sentences.
One major obstacle with EDA is the need for versatile and complete synonym dictionaries, which cannot be easily found in low-resource languages.
We propose two extensions, easy distributional data augmentation (EDDA) and type specific similar word replacement (TSSR), which uses semantic word context information and part-of-speech tags for word replacement and augmentation.
arXiv Detail & Related papers (2023-09-09T19:01:59Z) - Boosting Event Extraction with Denoised Structure-to-Text Augmentation [52.21703002404442]
Event extraction aims to recognize pre-defined event triggers and arguments from texts.
Recent data augmentation methods often neglect the problem of grammatical incorrectness.
We propose a denoised structure-to-text augmentation framework for event extraction DAEE.
arXiv Detail & Related papers (2023-05-16T16:52:07Z) - AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT)
AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples.
Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z) - Syntax-driven Data Augmentation for Named Entity Recognition [3.0603554929274908]
In low resource settings, data augmentation strategies are commonly leveraged to improve performance.
We compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve named entity recognition.
arXiv Detail & Related papers (2022-08-15T01:24:55Z) - GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation [9.501648136713694]
Large-scale language models such as GPT-3 are excellent few-shot learners, allowing them to be controlled via natural text prompts.
This paper proposes a novel data augmentation technique that leverages large-scale language models to generate realistic text samples.
arXiv Detail & Related papers (2021-04-18T11:39:33Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.