Evaluating the Effectiveness of Data Augmentation for Emotion Classification in Low-Resource Settings
- URL: http://arxiv.org/abs/2406.05190v1
- Date: Fri, 7 Jun 2024 18:13:27 GMT
- Title: Evaluating the Effectiveness of Data Augmentation for Emotion Classification in Low-Resource Settings
- Authors: Aashish Arora, Elsbeth Turcan,
- Abstract summary: We evaluated the effectiveness of different data augmentation techniques for a multi-label emotion classification task using a low-resource dataset.
Back Translation outperformed autoencoder-based approaches and that generating multiple examples per training instance led to further performance improvement.
- Score: 1.387446067205368
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Data augmentation has the potential to improve the performance of machine learning models by increasing the amount of training data available. In this study, we evaluated the effectiveness of different data augmentation techniques for a multi-label emotion classification task using a low-resource dataset. Our results showed that Back Translation outperformed autoencoder-based approaches and that generating multiple examples per training instance led to further performance improvement. In addition, we found that Back Translation generated the most diverse set of unigrams and trigrams. These findings demonstrate the utility of Back Translation in enhancing the performance of emotion classification models in resource-limited situations.
Related papers
- GASE: Generatively Augmented Sentence Encoding [0.0]
We propose an approach to enhance sentence embeddings by applying generative text models for data augmentation at inference time.
Generatively Augmented Sentence uses diverse synthetic variants of input texts generated by paraphrasing, summarising or extracting keywords.
We find that generative augmentation leads to larger performance improvements for embedding models with lower baseline performance.
arXiv Detail & Related papers (2024-11-07T17:53:47Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Contrastive Learning for Regression on Hyperspectral Data [4.931067393619175]
We propose a contrastive learning framework for the regression tasks for hyperspectral data.
Experiments on synthetic and real hyperspectral datasets show that the proposed framework and transformations significantly improve the performance of regression models.
arXiv Detail & Related papers (2024-02-12T21:33:46Z) - Adversarial Word Dilution as Text Data Augmentation in Low-Resource
Regime [35.95241861664597]
This paper proposes an Adversarial Word Dilution (AWD) method that can generate hard positive examples as text data augmentations.
Our idea of augmenting the text data is to dilute the embedding of strong positive words by weighted mixing with unknown-word embedding.
Empirical studies on three benchmark datasets show that AWD can generate more effective data augmentations and outperform the state-of-the-art text data augmentation methods.
arXiv Detail & Related papers (2023-05-16T08:46:11Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Emotions are Subtle: Learning Sentiment Based Text Representations Using
Contrastive Learning [6.6389732792316005]
We extend the use of contrastive learning embeddings to sentiment analysis tasks.
We show that fine-tuning on these embeddings provides an improvement over fine-tuning on BERT-based embeddings.
arXiv Detail & Related papers (2021-12-02T08:29:26Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.