Invariance assumptions for class distribution estimation
- URL: http://arxiv.org/abs/2311.17225v1
- Date: Tue, 28 Nov 2023 20:57:10 GMT
- Title: Invariance assumptions for class distribution estimation
- Authors: Dirk Tasche
- Abstract summary: We study the problem of class distribution estimation under dataset shift.
On the training dataset, both features and class labels are observed while on the test dataset only the features can be observed.
- Score: 1.3053649021965603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of class distribution estimation under dataset shift. On
the training dataset, both features and class labels are observed while on the
test dataset only the features can be observed. The task then is the estimation
of the distribution of the class labels, i.e. the estimation of the class prior
probabilities, in the test dataset. Assumptions of invariance between the
training joint distribution of features and labels and the test distribution
can considerably facilitate this task. We discuss the assumptions of covariate
shift, factorizable joint shift, and sparse joint shift and their implications
for class distribution estimation.
Related papers
- A generalized approach to label shift: the Conditional Probability Shift Model [0.8594140167290099]
Conditional Probability Shift (CPS) captures the case when the conditional distribution of the class variable given some specific features changes.
We present CPSM based on modeling the class variable's conditional probabilities using multinomial regression.
The effectiveness of CPSM is demonstrated through experiments on synthetic datasets and a case study using the MIMIC medical database.
arXiv Detail & Related papers (2025-03-04T13:07:20Z) - Label Distribution Learning using the Squared Neural Family on the Probability Simplex [15.680835401104247]
We estimate a probability distribution of all possible label distributions over the simplex.
With the modeled distribution, label distribution prediction can be achieved by performing the expectation operation.
More information about the label distribution can be inferred, such as the prediction reliability and uncertainties.
arXiv Detail & Related papers (2024-12-10T09:12:02Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction.
We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z) - Revisiting Long-tailed Image Classification: Survey and Benchmarks with
New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution.
Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z) - Factorizable Joint Shift in Multinomial Classification [3.3504365823045035]
We derive a representation of factorizable joint shift in terms of the source (training) distribution, the target (test) prior class probabilities and the target marginal distribution of the features.
Other results of the paper include correction formulae for the posterior class probabilities both under general dataset shift and factorizable joint shift.
arXiv Detail & Related papers (2022-07-29T07:21:44Z) - Fairness Transferability Subject to Bounded Distribution Shift [5.62716254065607]
Given an algorithmic predictor that is "fair" on some source distribution, will it still be fair on an unknown target distribution that differs from the source within some bound?
We study the transferability of statistical group fairness for machine learning predictors subject to bounded distribution shifts.
arXiv Detail & Related papers (2022-05-31T22:16:44Z) - Shift Happens: Adjusting Classifiers [2.8682942808330703]
Minimizing expected loss measured by a proper scoring rule, such as Brier score or log-loss (cross-entropy), is a common objective while training a probabilistic classifier.
We propose methods that transform all predictions to (re)equalize the average prediction and the class distribution.
We demonstrate experimentally that, when in practice the class distribution is known only approximately, there is often still a reduction in loss depending on the amount of shift and the precision to which the class distribution is known.
arXiv Detail & Related papers (2021-11-03T21:27:27Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - Minimising quantifier variance under prior probability shift [2.1320960069210475]
We find that it is a function of the Brier score for the regression of the class label against the features under the test data set distribution.
This observation suggests that optimising the accuracy of a base classifier on the training data set helps to reduce the variance of the related quantifier on the test data set.
arXiv Detail & Related papers (2021-07-17T09:28:06Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Robust Fairness under Covariate Shift [11.151913007808927]
Making predictions that are fair with regard to protected group membership has become an important requirement for classification algorithms.
We propose an approach that obtains the predictor that is robust to the worst-case in terms of target performance.
arXiv Detail & Related papers (2020-10-11T04:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.