Related papers: A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification

A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification

URL: http://arxiv.org/abs/2008.04636v1
Date: Tue, 11 Aug 2020 11:41:53 GMT
Title: A Comparison of Synthetic Oversampling Methods for Multi-class Text Classification
Authors: Anna Glazkova
Abstract summary: The authors compare oversampling methods for the problem of multi-class topic classification. The SMOTE algorithm underlies one of the most popular oversampling methods. The authors conclude that for this task, the quality of the KNN and SVM algorithms is more influenced by class imbalance than neural networks.
Score: 2.28438857884398
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The authors compared oversampling methods for the problem of multi-class topic classification. The SMOTE algorithm underlies one of the most popular oversampling methods. It consists in choosing two examples of a minority class and generating a new example based on them. In the paper, the authors compared the basic SMOTE method with its two modifications (Borderline SMOTE and ADASYN) and random oversampling technique on the example of one of text classification tasks. The paper discusses the k-nearest neighbor algorithm, the support vector machine algorithm and three types of neural networks (feedforward network, long short-term memory (LSTM) and bidirectional LSTM). The authors combine these machine learning algorithms with different text representations and compared synthetic oversampling methods. In most cases, the use of oversampling techniques can significantly improve the quality of classification. The authors conclude that for this task, the quality of the KNN and SVM algorithms is more influenced by class imbalance than neural networks.

Related papers

Kernel-Based Enhanced Oversampling Method for Imbalanced Classification [10.112750055561877]
This paper introduces a novel oversampling technique designed to improve classification performance on imbalanced datasets. The proposed method enhances the traditional SMOTE algorithm by incorporating convex combination and kernel-based weighting to generate synthetic samples that better represent the minority class.
arXiv Detail & Related papers (2025-04-12T09:24:23Z)
A Quantum Approach to Synthetic Minority Oversampling Technique (SMOTE) [1.5186937600119894]
The paper proposes the Quantum-SMOTE method to solve the prevalent problem of class imbalance in machine learning datasets. Quantum-SMOTE generates synthetic data points using quantum processes such as swap tests and quantum rotation. The approach is tested on a public dataset of Telecom Churn to determine its impact along with varying proportions of synthetic data.
arXiv Detail & Related papers (2024-02-27T10:46:36Z)
Intra-class Adaptive Augmentation with Neighbor Correction for Deep Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning. We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining. Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification. The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z)
Gated recurrent units and temporal convolutional network for multilabel classification [122.84638446560663]
This work proposes a new ensemble method for managing multilabel classification. The core of the proposed approach combines a set of gated recurrent units and temporal convolutional neural networks trained with variants of the Adam gradients optimization approach.
arXiv Detail & Related papers (2021-10-09T00:00:16Z)
A multi-schematic classifier-independent oversampling approach for imbalanced datasets [0.0]
It is evident from previous studies that different oversampling algorithms have different degrees of efficiency with different classifiers. Here, we overcome this problem with a multi-schematic and classifier-independent oversampling approach: ProWRAS. ProWRAS integrates the Localized Random Affine Shadowsampling (LoRAS)algorithm and the Proximity Weighted Synthetic oversampling (ProWSyn) algorithm.
arXiv Detail & Related papers (2021-07-15T14:03:24Z)
A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios [23.257891827728827]
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. We propose a novel three step technique to address imbalanced data.
arXiv Detail & Related papers (2021-03-24T09:58:02Z)
A Method for Handling Multi-class Imbalanced Data by Geometry based Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem. Two novel methods are proposed that exploit the geometric relationship between the feature vectors. The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z)
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models. We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation. We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z)
Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms. We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.