Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss
- URL: http://arxiv.org/abs/2505.00021v1
- Date: Thu, 24 Apr 2025 16:35:44 GMT
- Title: Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss
- Authors: Zhuoang Cai, Zhenghao Li, Yang Liu, Liyuan Guo, Yangqiu Song,
- Abstract summary: classification tasks often suffer from severe class imbalances, short and unstructured text, and overlapping semantic categories.<n>We present our system for SemEval- 2025 Task 9: Food Hazard Detection, which ad- dresses these issues by applying data augmenta- tion techniques to improve classification perfor- mance.
- Score: 38.70308073598037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classification tasks often suffer from imbal- anced data distribution, which presents chal- lenges in food hazard detection due to severe class imbalances, short and unstructured text, and overlapping semantic categories. In this paper, we present our system for SemEval- 2025 Task 9: Food Hazard Detection, which ad- dresses these issues by applying data augmenta- tion techniques to improve classification perfor- mance. We utilize transformer-based models, BERT and RoBERTa, as backbone classifiers and explore various data balancing strategies, including random oversampling, Easy Data Augmentation (EDA), and focal loss. Our ex- periments show that EDA effectively mitigates class imbalance, leading to significant improve- ments in accuracy and F1 scores. Furthermore, combining focal loss with oversampling and EDA further enhances model robustness, par- ticularly for hard-to-classify examples. These findings contribute to the development of more effective NLP-based classification models for food hazard detection.
Related papers
- An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection with Explainable AI [0.0]
This paper presents a novel multistage focal loss function designed to enhance the performance imbalanced by helping to escape local classma.<n>Our results demonstrate the efficacy of multistage focal loss in boosting robustness in highly skewed classification.
arXiv Detail & Related papers (2025-08-04T10:53:10Z) - Model-agnostic Mitigation Strategies of Data Imbalance for Regression [0.0]
Data imbalance persists as a pervasive challenge in regression tasks, introducing bias in model performance and undermining predictive reliability.<n>We present advanced mitigation techniques, which build upon and improve existing sampling methods.<n>We demonstrate that constructing an ensemble of models -- one trained with imbalance mitigation and another without -- can significantly reduce these negative effects.
arXiv Detail & Related papers (2025-06-02T09:46:08Z) - Wafer Map Defect Classification Using Autoencoder-Based Data Augmentation and Convolutional Neural Network [4.8748194765816955]
This study proposes a novel method combining a self-encoder-based data augmentation technique with a convolutional neural network (CNN)
The proposed method achieves a classification accuracy of 98.56%, surpassing Random Forest, SVM, and Logistic Regression by 19%, 21%, and 27%, respectively.
arXiv Detail & Related papers (2024-11-17T10:19:54Z) - Energy Score-based Pseudo-Label Filtering and Adaptive Loss for Imbalanced Semi-supervised SAR target recognition [1.2035771704626825]
Existing semi-supervised SAR ATR algorithms show low recognition accuracy in the case of class imbalance.
This work offers a non-balanced semi-supervised SAR target recognition approach using dynamic energy scores and adaptive loss.
arXiv Detail & Related papers (2024-11-06T14:45:16Z) - Systematic Evaluation of Synthetic Data Augmentation for Multi-class NetFlow Traffic [2.5182419298876857]
Multi-class classification models can identify specific types of attacks, allowing for more targeted and effective incident responses.
Recent advances suggest that generative models can assist in data augmentation, claiming to offer superior solutions for imbalanced datasets.
Our experiments indicate that resampling methods for balancing training data do not reliably improve classification performance.
arXiv Detail & Related papers (2024-08-28T12:44:07Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Diffusion Denoising Process for Perceptron Bias in Out-of-distribution
Detection [67.49587673594276]
We introduce a new perceptron bias assumption that suggests discriminator models are more sensitive to certain features of the input, leading to the overconfidence problem.
We demonstrate that the diffusion denoising process (DDP) of DMs serves as a novel form of asymmetric, which is well-suited to enhance the input and mitigate the overconfidence problem.
Our experiments on CIFAR10, CIFAR100, and ImageNet show that our method outperforms SOTA approaches.
arXiv Detail & Related papers (2022-11-21T08:45:08Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for
Replay Attack Detection [10.851348154870852]
We argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process.
We propose to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself.
With complementary features, our fusion system with only three kinds of features outperforms other systems by 22.5% for min-tDCF and 7% for EER.
arXiv Detail & Related papers (2020-06-25T17:06:47Z) - Data Augmentation Imbalance For Imbalanced Attribute Classification [60.71438625139922]
We propose a new re-sampling algorithm called: data augmentation imbalance (DAI) to explicitly enhance the ability to discriminate the fewer attributes.
Our DAI algorithm achieves state-of-the-art results, based on pedestrian attribute datasets.
arXiv Detail & Related papers (2020-04-19T20:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.