Binary Quantification and Dataset Shift: An Experimental Investigation
- URL: http://arxiv.org/abs/2310.04565v1
- Date: Fri, 6 Oct 2023 20:11:27 GMT
- Title: Binary Quantification and Dataset Shift: An Experimental Investigation
- Authors: Pablo Gonz\'alez and Alejandro Moreo and Fabrizio Sebastiani
- Abstract summary: Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data.
The relationship between quantification and other types of dataset shift remains, by and large, unexplored.
We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
- Score: 54.14283123210872
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantification is the supervised learning task that consists of training
predictors of the class prevalence values of sets of unlabelled data, and is of
special interest when the labelled data on which the predictor has been trained
and the unlabelled data are not IID, i.e., suffer from dataset shift. To date,
quantification methods have mostly been tested only on a special case of
dataset shift, i.e., prior probability shift; the relationship between
quantification and other types of dataset shift remains, by and large,
unexplored. In this work we carry out an experimental analysis of how current
quantification algorithms behave under different types of dataset shift, in
order to identify limitations of current approaches and hopefully pave the way
for the development of more broadly applicable methods. We do this by proposing
a fine-grained taxonomy of types of dataset shift, by establishing protocols
for the generation of datasets affected by these types of shift, and by testing
existing quantification methods on the datasets thus generated. One finding
that results from this investigation is that many existing quantification
methods that had been found robust to prior probability shift are not
necessarily robust to other types of dataset shift. A second finding is that no
existing quantification method seems to be robust enough to dealing with all
the types of dataset shift we simulate in our experiments. The code needed to
reproduce all our experiments is publicly available at
https://github.com/pglez82/quant_datasetshift.
Related papers
- Automatic dataset shift identification to support root cause analysis of AI performance drift [13.996602963045387]
Shifts in data distribution can substantially harm the performance of clinical AI models.
We propose the first unsupervised dataset shift identification framework.
We report promising results for the proposed framework on five types of real-world dataset shifts.
arXiv Detail & Related papers (2024-11-12T17:09:20Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Adversarial Learning for Feature Shift Detection and Correction [45.65548560695731]
Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in structured data, where faulty standardization and data processing pipelines can lead to erroneous features.
In this work, we explore using the principles of adversarial learning, where the information from several discriminators trained to distinguish between two distributions is used to both detect the corrupted features and fix them in order to remove the distribution shift between datasets.
arXiv Detail & Related papers (2023-12-07T18:58:40Z) - Time-Varying Propensity Score to Bridge the Gap between the Past and Present [104.46387765330142]
We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data.
We demonstrate different ways of implementing it and evaluate it on a variety of problems.
arXiv Detail & Related papers (2022-10-04T07:21:49Z) - A unified framework for dataset shift diagnostics [2.449909275410288]
Supervised learning techniques typically assume training data originates from the target population.
Yet, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors.
We propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts.
arXiv Detail & Related papers (2022-05-17T13:34:45Z) - Shifts: A Dataset of Real Distributional Shift Across Multiple
Large-Scale Tasks [44.61070965407907]
Given the current state of the field, a standardized large-scale dataset of tasks across a range of modalities affected by distributional shifts is necessary.
We propose the emphShifts dataset for evaluation of uncertainty estimates and robustness to distributional shift.
arXiv Detail & Related papers (2021-07-15T16:59:34Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z) - Robust Classification under Class-Dependent Domain Shift [29.54336432319199]
In this paper we explore a special type of dataset shift which we call class-dependent domain shift.
It is characterized by the following features: the input data causally depends on the label, the shift in the data is fully explained by a known variable, the variable which controls the shift can depend on the label, there is no shift in the label distribution.
arXiv Detail & Related papers (2020-07-10T12:26:57Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.