A Comparison of Synthetic Oversampling Methods for Multi-class Text
Classification
- URL: http://arxiv.org/abs/2008.04636v1
- Date: Tue, 11 Aug 2020 11:41:53 GMT
- Title: A Comparison of Synthetic Oversampling Methods for Multi-class Text
Classification
- Authors: Anna Glazkova
- Abstract summary: The authors compare oversampling methods for the problem of multi-class topic classification.
The SMOTE algorithm underlies one of the most popular oversampling methods.
The authors conclude that for this task, the quality of the KNN and SVM algorithms is more influenced by class imbalance than neural networks.
- Score: 2.28438857884398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The authors compared oversampling methods for the problem of multi-class
topic classification. The SMOTE algorithm underlies one of the most popular
oversampling methods. It consists in choosing two examples of a minority class
and generating a new example based on them. In the paper, the authors compared
the basic SMOTE method with its two modifications (Borderline SMOTE and ADASYN)
and random oversampling technique on the example of one of text classification
tasks. The paper discusses the k-nearest neighbor algorithm, the support vector
machine algorithm and three types of neural networks (feedforward network, long
short-term memory (LSTM) and bidirectional LSTM). The authors combine these
machine learning algorithms with different text representations and compared
synthetic oversampling methods. In most cases, the use of oversampling
techniques can significantly improve the quality of classification. The authors
conclude that for this task, the quality of the KNN and SVM algorithms is more
influenced by class imbalance than neural networks.
Related papers
- A Quantum Approach to Synthetic Minority Oversampling Technique (SMOTE) [1.5186937600119894]
The paper proposes the Quantum-SMOTE method to solve the prevalent problem of class imbalance in machine learning datasets.
Quantum-SMOTE generates synthetic data points using quantum processes such as swap tests and quantum rotation.
The approach is tested on a public dataset of Telecom Churn to determine its impact along with varying proportions of synthetic data.
arXiv Detail & Related papers (2024-02-27T10:46:36Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification.
The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z) - Gated recurrent units and temporal convolutional network for multilabel
classification [122.84638446560663]
This work proposes a new ensemble method for managing multilabel classification.
The core of the proposed approach combines a set of gated recurrent units and temporal convolutional neural networks trained with variants of the Adam gradients optimization approach.
arXiv Detail & Related papers (2021-10-09T00:00:16Z) - A multi-schematic classifier-independent oversampling approach for
imbalanced datasets [0.0]
It is evident from previous studies that different oversampling algorithms have different degrees of efficiency with different classifiers.
Here, we overcome this problem with a multi-schematic and classifier-independent oversampling approach: ProWRAS.
ProWRAS integrates the Localized Random Affine Shadowsampling (LoRAS)algorithm and the Proximity Weighted Synthetic oversampling (ProWSyn) algorithm.
arXiv Detail & Related papers (2021-07-15T14:03:24Z) - A Novel Adaptive Minority Oversampling Technique for Improved
Classification in Data Imbalanced Scenarios [23.257891827728827]
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers.
We propose a novel three step technique to address imbalanced data.
arXiv Detail & Related papers (2021-03-24T09:58:02Z) - A Method for Handling Multi-class Imbalanced Data by Geometry based
Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem.
Two novel methods are proposed that exploit the geometric relationship between the feature vectors.
The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z) - A Systematic Characterization of Sampling Algorithms for Open-ended
Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models.
We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation.
We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.