Data Augmentation for Emotion Detection in Small Imbalanced Text Data
- URL: http://arxiv.org/abs/2310.17015v3
- Date: Mon, 30 Oct 2023 13:33:16 GMT
- Title: Data Augmentation for Emotion Detection in Small Imbalanced Text Data
- Authors: Anna Koufakou, Diego Grisales, Ragy Costa de jesus, Oscar Fox
- Abstract summary: One of the challenges is the shortage of available datasets that have been annotated with emotions.
We studied the impact of data augmentation techniques precisely when applied to small imbalanced datasets.
Our experimental results show that using the augmented data when training the classifier model leads to significant improvements.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Emotion recognition in text, the task of identifying emotions such as joy or
anger, is a challenging problem in NLP with many applications. One of the
challenges is the shortage of available datasets that have been annotated with
emotions. Certain existing datasets are small, follow different emotion
taxonomies and display imbalance in their emotion distribution. In this work,
we studied the impact of data augmentation techniques precisely when applied to
small imbalanced datasets, for which current state-of-the-art models (such as
RoBERTa) under-perform. Specifically, we utilized four data augmentation
methods (Easy Data Augmentation EDA, static and contextual Embedding-based, and
ProtAugment) on three datasets that come from different sources and vary in
size, emotion categories and distributions. Our experimental results show that
using the augmented data when training the classifier model leads to
significant improvements. Finally, we conducted two case studies: a) directly
using the popular chat-GPT API to paraphrase text using different prompts, and
b) using external data to augment the training set. Results show the promising
potential of these methods.
Related papers
- LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - eMotions: A Large-Scale Dataset for Emotion Recognition in Short Videos [7.011656298079659]
The prevailing use of short videos (SVs) leads to the necessity of emotion recognition in SVs.
Considering the lack of SVs emotion data, we introduce a large-scale dataset named eMotions, comprising 27,996 videos.
We present an end-to-end baseline method AV-CPNet that employs the video transformer to better learn semantically relevant representations.
arXiv Detail & Related papers (2023-11-29T03:24:30Z) - Automatically Classifying Emotions based on Text: A Comparative
Exploration of Different Datasets [0.0]
We focus on three datasets that were recently presented in the related literature.
We explore the performance of traditional as well as state-of-the-art deep learning models in the presence of different characteristics in the data.
Our experimental work shows that state-of-the-art models such as RoBERTa perform the best for all cases.
arXiv Detail & Related papers (2023-02-28T16:34:55Z) - AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT)
AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples.
Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z) - Advanced Data Augmentation Approaches: A Comprehensive Survey and Future
directions [57.30984060215482]
We provide a background of data augmentation, a novel and comprehensive taxonomy of reviewed data augmentation techniques, and the strengths and weaknesses (wherever possible) of each technique.
We also provide comprehensive results of the data augmentation effect on three popular computer vision tasks, such as image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2023-01-07T11:37:32Z) - Persian Emotion Detection using ParsBERT and Imbalanced Data Handling
Approaches [0.0]
EmoPars and ArmanEmo are two new human-labeled emotion datasets for the Persian language.
We evaluate EmoPars and compare them with ArmanEmo.
Our model reaches a Macro-averaged F1-score of 0.81 and 0.76 on ArmanEmo and EmoPars, respectively.
arXiv Detail & Related papers (2022-11-15T10:22:49Z) - A Comparative Study of Data Augmentation Techniques for Deep Learning
Based Emotion Recognition [11.928873764689458]
We conduct a comprehensive evaluation of popular deep learning approaches for emotion recognition.
We show that long-range dependencies in the speech signal are critical for emotion recognition.
Speed/rate augmentation offers the most robust performance gain across models.
arXiv Detail & Related papers (2022-11-09T17:27:03Z) - ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models [82.63962107729994]
Any-Shot Data-to-Text (ASDOT) is a new approach flexibly applicable to diverse settings.
It consists of two steps, data disambiguation and sentence fusion.
Experimental results show that ASDOT consistently achieves significant improvement over baselines.
arXiv Detail & Related papers (2022-10-09T19:17:43Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - A cross-corpus study on speech emotion recognition [29.582678406878568]
This study investigates whether information learnt from acted emotions is useful for detecting natural emotions.
Four adult English datasets covering acted, elicited and natural emotions are considered.
A state-of-the-art model is proposed to accurately investigate the degradation of performance.
arXiv Detail & Related papers (2022-07-05T15:15:22Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.