A method for classification of data with uncertainty using hypothesis testing
- URL: http://arxiv.org/abs/2502.08582v1
- Date: Wed, 12 Feb 2025 17:14:07 GMT
- Title: A method for classification of data with uncertainty using hypothesis testing
- Authors: Shoma Yokura, Akihisa Ichiki,
- Abstract summary: It is necessary to quantify uncertainty and adopt decision-making approaches that take it into account.
We propose a new decision-making approach using two types of hypothesis testing.
This method is capable of detecting ambiguous data that belong to the overlapping regions of two class distributions.
- Score: 0.0
- License:
- Abstract: Binary classification is a task that involves the classification of data into one of two distinct classes. It is widely utilized in various fields. However, conventional classifiers tend to make overconfident predictions for data that belong to overlapping regions of the two class distributions or for data outside the distributions (out-of-distribution data). Therefore, conventional classifiers should not be applied in high-risk fields where classification results can have significant consequences. In order to address this issue, it is necessary to quantify uncertainty and adopt decision-making approaches that take it into account. Many methods have been proposed for this purpose; however, implementing these methods often requires performing resampling, improving the structure or performance of models, and optimizing the thresholds of classifiers. We propose a new decision-making approach using two types of hypothesis testing. This method is capable of detecting ambiguous data that belong to the overlapping regions of two class distributions, as well as out-of-distribution data that are not included in the training data distribution. In addition, we quantify uncertainty using the empirical distribution of feature values derived from the training data obtained through the trained model. The classification threshold is determined by the $\alpha$-quantile and ($1-\alpha$)-quantile, where the significance level $\alpha$ is set according to each specific situation.
Related papers
- Anomaly Detection Under Uncertainty Using Distributionally Robust
Optimization Approach [0.9217021281095907]
Anomaly detection is defined as the problem of finding data points that do not follow the patterns of the majority.
The one-class Support Vector Machines (SVM) method aims to find a decision boundary to distinguish between normal data points and anomalies.
A distributionally robust chance-constrained model is proposed in which the probability of misclassification is low.
arXiv Detail & Related papers (2023-12-03T06:13:22Z) - Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers)
We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference.
We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z) - Benchmarking common uncertainty estimation methods with
histopathological images under domain shift and label noise [62.997667081978825]
In high-risk environments, deep learning models need to be able to judge their uncertainty and reject inputs when there is a significant chance of misclassification.
We conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images.
We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise.
arXiv Detail & Related papers (2023-01-03T11:34:36Z) - An Upper Bound for the Distribution Overlap Index and Its Applications [22.92968284023414]
This paper proposes an easy-to-compute upper bound for the overlap index between two probability distributions.
The proposed bound shows its value in one-class classification and domain shift analysis.
Our work shows significant promise toward broadening the applications of overlap-based metrics.
arXiv Detail & Related papers (2022-12-16T20:02:03Z) - Learning Acceptance Regions for Many Classes with Anomaly Detection [19.269724165953274]
Many existing set-valued classification methods do not consider the possibility that a new class that never appeared in the training data appears in the test data.
We propose a Generalized Prediction Set (GPS) approach to estimate the acceptance regions while considering the possibility of a new class in the test data.
Unlike previous methods, the proposed method achieves a good balance between accuracy, efficiency, and anomaly detection rate.
arXiv Detail & Related papers (2022-09-20T19:40:33Z) - Determination of class-specific variables in nonparametric
multiple-class classification [0.0]
We propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class.
We report the properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations.
arXiv Detail & Related papers (2022-05-07T10:08:58Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - Positive-Unlabeled Classification under Class-Prior Shift: A
Prior-invariant Approach Based on Density Ratio Estimation [85.75352990739154]
We propose a novel PU classification method based on density ratio estimation.
A notable advantage of our proposed method is that it does not require the class-priors in the training phase.
arXiv Detail & Related papers (2021-07-11T13:36:53Z) - Performance-Agnostic Fusion of Probabilistic Classifier Outputs [2.4206828137867107]
We propose a method for combining probabilistic outputs of classifiers to make a single consensus class prediction.
Our proposed method works well in situations where accuracy is the performance metric.
It does not output calibrated probabilities, so it is not suitable in situations where such probabilities are required for further processing.
arXiv Detail & Related papers (2020-09-01T16:53:29Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.