Related papers: Imbalanced Classification via a Tabular Translation GAN

Imbalanced Classification via a Tabular Translation GAN

URL: http://arxiv.org/abs/2204.08683v1
Date: Tue, 19 Apr 2022 06:02:53 GMT
Title: Imbalanced Classification via a Tabular Translation GAN
Authors: Jonathan Gradstein, Moshe Salhov, Yoav Tulpan, Ofir Lindenbaum, Amir Averbuch
Abstract summary: We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples. We show that the proposed method improves average precision when compared to alternative re-weighting and oversampling techniques.
Score: 4.864819846886142
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When presented with a binary classification problem where the data exhibits severe class imbalance, most standard predictive methods may fail to accurately model the minority class. We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples. This translation mechanism encourages the synthesized samples to be close to the class boundary. Furthermore, we explore a selection criterion to retain the most useful of the synthesized samples. Experimental results using several downstream classifiers on a variety of tabular class-imbalanced datasets show that the proposed method improves average precision when compared to alternative re-weighting and oversampling techniques.

Related papers

Adaptive Cluster-Based Synthetic Minority Oversampling Technique for Traffic Mode Choice Prediction with Imbalanced Dataset [0.0]
Density-based spatial clustering is applied on minority classes to identify subgroups. The classes in each of these subgroups are then oversampled according to the ratio of data points of their local cluster to the largest majority class. When used in conjunction with machine learning models such as random forest and extreme gradient boosting, this oversampling method results in significantly higher F1 scores for the minority classes.
arXiv Detail & Related papers (2025-04-13T08:58:31Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Does Adversarial Oversampling Help us? [10.210871872870737]
We propose a three-player adversarial game-based end-to-end method to handle class imbalance in datasets. Rather than adversarial minority oversampling, we propose an adversarial oversampling (AO) and a data-space oversampling (DO) approach. The effectiveness of our proposed method has been validated with high-dimensional, highly imbalanced and large-scale multi-class datasets.
arXiv Detail & Related papers (2021-08-20T05:43:17Z)
A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios [23.257891827728827]
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. We propose a novel three step technique to address imbalanced data.
arXiv Detail & Related papers (2021-03-24T09:58:02Z)
Weakly Supervised-Based Oversampling for High Imbalance and High Dimensionality Data Classification [2.9283685972609494]
Oversampling is an effective method to solve imbalanced classification. Inaccurate labels of synthetic samples would distort the distribution of the dataset. This paper introduces the idea of weakly supervised learning to handle the inaccurate labeling of synthetic samples.
arXiv Detail & Related papers (2020-09-29T15:26:34Z)
Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning [10.051309746913512]
We propose an oversampling method based on a conditional Wasserstein GAN. We benchmark our method against standard oversampling methods and the imbalanced baseline on seven real-world datasets.
arXiv Detail & Related papers (2020-08-20T20:33:56Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework. We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z)
M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes. Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.