Towards Stable Imbalanced Data Classification via Virtual Big Data
Projection
- URL: http://arxiv.org/abs/2009.08387v1
- Date: Sun, 23 Aug 2020 04:01:51 GMT
- Title: Towards Stable Imbalanced Data Classification via Virtual Big Data
Projection
- Authors: Hadi Mansourifar, Weidong Shi
- Abstract summary: We investigate the capability of VBD to address deep autoencoder training and imbalanced data classification.
First, we prove that, VBD can significantly decrease the validation loss of autoencoders via providing them a huge diversified training data.
Second, we propose the first projection-based method called cross-concatenation to balance the skewed class distributions without over-sampling.
- Score: 3.3707422585608953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual Big Data (VBD) proved to be effective to alleviate mode collapse and
vanishing generator gradient as two major problems of Generative Adversarial
Neural Networks (GANs) very recently. In this paper, we investigate the
capability of VBD to address two other major challenges in Machine Learning
including deep autoencoder training and imbalanced data classification. First,
we prove that, VBD can significantly decrease the validation loss of
autoencoders via providing them a huge diversified training data which is the
key to reach better generalization to minimize the over-fitting problem.
Second, we use the VBD to propose the first projection-based method called
cross-concatenation to balance the skewed class distributions without
over-sampling. We prove that, cross-concatenation can solve uncertainty problem
of data driven methods for imbalanced classification.
Related papers
- Wafer Map Defect Classification Using Autoencoder-Based Data Augmentation and Convolutional Neural Network [4.8748194765816955]
This study proposes a novel method combining a self-encoder-based data augmentation technique with a convolutional neural network (CNN)
The proposed method achieves a classification accuracy of 98.56%, surpassing Random Forest, SVM, and Logistic Regression by 19%, 21%, and 27%, respectively.
arXiv Detail & Related papers (2024-11-17T10:19:54Z) - Revisiting the Disequilibrium Issues in Tackling Heart Disease Classification Tasks [5.834731599084117]
Two primary obstacles arise in the field of heart disease classification.
Electrocardiogram (ECG) datasets consistently demonstrate imbalances and biases across various modalities.
We propose a Channel-wise Magnitude Equalizer (CME) on signal-encoded images.
We also propose the Inverted Weight Logarithmic Loss (IWL) to alleviate imbalances among the data.
arXiv Detail & Related papers (2024-07-19T09:50:49Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - P$^2$OT: Progressive Partial Optimal Transport for Deep Imbalanced
Clustering [16.723646401890495]
We propose a novel pseudo-labeling-based learning framework for deep clustering.
Our framework generates imbalance-aware pseudo-labels and learning from high-confident samples.
Experiments on various datasets, including a human-curated long-tailed CIFAR100, demonstrate the superiority of our method.
arXiv Detail & Related papers (2024-01-17T15:15:46Z) - Diversity-enhancing Generative Network for Few-shot Hypothesis
Adaptation [135.80439360370556]
We propose a diversity-enhancing generative network (DEG-Net) for the FHA problem.
It can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC)
arXiv Detail & Related papers (2023-07-12T06:29:02Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Federated Causal Discovery [74.37739054932733]
This paper develops a gradient-based learning framework named DAG-Shared Federated Causal Discovery (DS-FCD)
It can learn the causal graph without directly touching local data and naturally handle the data heterogeneity.
Extensive experiments on both synthetic and real-world datasets verify the efficacy of the proposed method.
arXiv Detail & Related papers (2021-12-07T08:04:12Z) - Fine-grained Data Distribution Alignment for Post-Training Quantization [100.82928284439271]
We propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization.
Our method shows the state-of-the-art performance on ImageNet, especially when the first and last layers are quantized to low-bit.
arXiv Detail & Related papers (2021-09-09T11:45:52Z) - Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes.
We propose a new recognition setting, namely semi-supervised long-tailed recognition.
We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z) - Mitigating Dataset Imbalance via Joint Generation and Classification [17.57577266707809]
Supervised deep learning methods are enjoying enormous success in many practical applications of computer vision.
The marked performance degradation to biases and imbalanced data questions the reliability of these methods.
We introduce a joint dataset repairment strategy by combining a neural network classifier with Generative Adversarial Networks (GAN)
We show that the combined training helps to improve the robustness of both the classifier and the GAN against severe class imbalance.
arXiv Detail & Related papers (2020-08-12T18:40:38Z) - Imbalanced Data Learning by Minority Class Augmentation using Capsule
Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods.
In our model, generative and discriminative networks play a novel competitive game.
The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.