Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performance Enhancement
- URL: http://arxiv.org/abs/2511.14073v2
- Date: Wed, 19 Nov 2025 14:56:35 GMT
- Title: Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performance Enhancement
- Authors: Zijin Su, Huanzhu Lyu, Yuren Niu, Yiming Liu,
- Abstract summary: Multi-label sentiment classification plays a vital role in natural language processing by detecting multiple emotions within a single text.<n>Existing datasets like GoEmotions often suffer from severe class imbalance, which hampers model performance.<n>We constructed a balanced multi-label sentiment dataset using GoEmotions data, emotion-labeled samples from Sentiment140, and manually annotated texts.<n> Experimental results demonstrate significant improvements in accuracy, precision, recall, F1-score, and AUC compared to models trained on imbalanced data.
- Score: 5.149011601951617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-label sentiment classification plays a vital role in natural language processing by detecting multiple emotions within a single text. However, existing datasets like GoEmotions often suffer from severe class imbalance, which hampers model performance, especially for underrepresented emotions. To address this, we constructed a balanced multi-label sentiment dataset by integrating the original GoEmotions data, emotion-labeled samples from Sentiment140 using a RoBERTa-base-GoEmotions model, and manually annotated texts generated by GPT-4 mini. Our data balancing strategy ensured an even distribution across 28 emotion categories. Based on this dataset, we developed an enhanced multi-label classification model that combines pre-trained FastText embeddings, convolutional layers for local feature extraction, bidirectional LSTM for contextual learning, and an attention mechanism to highlight sentiment-relevant words. A sigmoid-activated output layer enables multi-label prediction, and mixed precision training improves computational efficiency. Experimental results demonstrate significant improvements in accuracy, precision, recall, F1-score, and AUC compared to models trained on imbalanced data, highlighting the effectiveness of our approach.
Related papers
- Optimizing Small Transformer-Based Language Models for Multi-Label Sentiment Analysis in Short Texts [4.166512373146747]
We evaluate the effectiveness of small Transformer-based models for sentiment classification in short texts.<n>We show that data augmentation improves classification performance, while continued pre-training on augmented datasets can introduce noise rather than boost accuracy.
arXiv Detail & Related papers (2025-09-05T10:08:14Z) - Improving Arabic Multi-Label Emotion Classification using Stacked Embeddings and Hybrid Loss Function [4.149971421068989]
This study uses stacked embeddings, meta-learning, and a hybrid loss function to enhance multi-label emotion classification for the Arabic language.
To further improve performance, a hybrid loss function is introduced, incorporating class weighting, label correlation, and contrastive learning.
Experiments validate the proposed model's performance across key metrics such as Precision, Recall, F1-Score, Jaccard Accuracy, and Hamming Loss.
arXiv Detail & Related papers (2024-10-04T23:37:21Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot
Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification.
We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z) - REDAffectiveLM: Leveraging Affect Enriched Embedding and
Transformer-based Neural Language Model for Readers' Emotion Detection [3.6678641723285446]
We propose a novel approach for Readers' Emotion Detection from short-text documents using a deep learning model called REDAffectiveLM.
We leverage context-specific and affect enriched representations by using a transformer-based pre-trained language model in tandem with affect enriched Bi-LSTM+Attention.
arXiv Detail & Related papers (2023-01-21T19:28:25Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - DeepEmotex: Classifying Emotion in Text Messages using Deep Transfer
Learning [0.0]
We propose DeepEmotex an effective sequential transfer learning method to detect emotion in text.
We conduct an experimental study using both curated Twitter data sets and benchmark data sets.
DeepEmotex models achieve over 91% accuracy for multi-class emotion classification on test dataset.
arXiv Detail & Related papers (2022-06-12T03:23:40Z) - Active Learning by Feature Mixing [52.16150629234465]
We propose a novel method for batch active learning called ALFA-Mix.
We identify unlabelled instances with sufficiently-distinct features by seeking inconsistencies in predictions.
We show that inconsistencies in these predictions help discovering features that the model is unable to recognise in the unlabelled instances.
arXiv Detail & Related papers (2022-03-14T12:20:54Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.