Skew Probabilistic Neural Networks for Learning from Imbalanced Data
- URL: http://arxiv.org/abs/2312.05878v1
- Date: Sun, 10 Dec 2023 13:12:55 GMT
- Title: Skew Probabilistic Neural Networks for Learning from Imbalanced Data
- Authors: Shraddha M. Naik, Tanujit Chakraborty, Abdenour Hadid, Bibhas
Chakraborty
- Abstract summary: This paper introduces an imbalanced data-oriented approach using probabilistic neural networks (PNNs) with a skew normal probability kernel.
We show that SkewPNNs substantially outperform state-of-the-art machine learning methods for both balanced and imbalanced datasets in most experimental settings.
- Score: 3.7892198600060945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world datasets often exhibit imbalanced data distribution, where certain
class levels are severely underrepresented. In such cases, traditional pattern
classifiers have shown a bias towards the majority class, impeding accurate
predictions for the minority class. This paper introduces an imbalanced
data-oriented approach using probabilistic neural networks (PNNs) with a skew
normal probability kernel to address this major challenge. PNNs are known for
providing probabilistic outputs, enabling quantification of prediction
confidence and uncertainty handling. By leveraging the skew normal
distribution, which offers increased flexibility, particularly for imbalanced
and non-symmetric data, our proposed Skew Probabilistic Neural Networks
(SkewPNNs) can better represent underlying class densities. To optimize the
performance of the proposed approach on imbalanced datasets, hyperparameter
fine-tuning is imperative. To this end, we employ a population-based heuristic
algorithm, Bat optimization algorithms, for effectively exploring the
hyperparameter space. We also prove the statistical consistency of the density
estimates which suggests that the true distribution will be approached smoothly
as the sample size increases. Experimental simulations have been conducted on
different synthetic datasets, comparing various benchmark-imbalanced learners.
Our real-data analysis shows that SkewPNNs substantially outperform
state-of-the-art machine learning methods for both balanced and imbalanced
datasets in most experimental settings.
Related papers
- Fair CoVariance Neural Networks [34.68621550644667]
We propose Fair coVariance Neural Networks (FVNNs), which perform graph convolutions on the covariance matrix for both fair and accurate predictions.
We prove that FVNNs are intrinsically fairer than analogous PCA approaches thanks to their stability in low sample regimes.
arXiv Detail & Related papers (2024-09-13T06:24:18Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Probabilistic Neural Networks (PNNs) for Modeling Aleatoric Uncertainty
in Scientific Machine Learning [2.348041867134616]
This paper investigates the use of probabilistic neural networks (PNNs) to model aleatoric uncertainty.
PNNs generate probability distributions for the target variable, allowing the determination of both predicted means and intervals in regression scenarios.
In a real-world scientific machine learning context, PNNs yield remarkably accurate output mean estimates with R-squared scores approaching 0.97, and their predicted intervals exhibit a high correlation coefficient of nearly 0.80.
arXiv Detail & Related papers (2024-02-21T17:15:47Z) - Effective Class-Imbalance learning based on SMOTE and Convolutional
Neural Networks [0.1074267520911262]
Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results.
In this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs)
In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions.
arXiv Detail & Related papers (2022-09-01T07:42:16Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.