Related papers: A unified framework for dataset shift diagnostics

A unified framework for dataset shift diagnostics

URL: http://arxiv.org/abs/2205.08340v4
Date: Tue, 12 Sep 2023 23:36:23 GMT
Title: A unified framework for dataset shift diagnostics
Authors: Felipe Maia Polo, Rafael Izbicki, Evanildo Gomes Lacerda Jr, Juan Pablo Ibieta-Jimenez, Renato Vicente
Abstract summary: Supervised learning techniques typically assume training data originates from the target population. Yet, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors. We propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts.
Score: 2.449909275410288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Supervised learning techniques typically assume training data originates from the target population. Yet, in reality, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors. In this work, we propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts, encompassing shifts in the distributions of $(X, Y)$, $X$, $Y$, $X|Y$, and $Y|X$. DetectShift equips practitioners with insights into data shifts, facilitating the adaptation or retraining of predictors using both source and target data. This proves extremely valuable when labeled samples in the target domain are limited. The framework utilizes test statistics with the same nature to quantify the magnitude of the various shifts, making results more interpretable. It is versatile, suitable for regression and classification tasks, and accommodates diverse data forms - tabular, text, or image. Experimental results demonstrate the effectiveness of DetectShift in detecting dataset shifts even in higher dimensions.

Related papers

Automatic dataset shift identification to support root cause analysis of AI performance drift [13.996602963045387]
Shifts in data distribution can substantially harm the performance of clinical AI models. We propose the first unsupervised dataset shift identification framework. We report promising results for the proposed framework on five types of real-world dataset shifts.
arXiv Detail & Related papers (2024-11-12T17:09:20Z)
Adversarial Learning for Feature Shift Detection and Correction [45.65548560695731]
Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in structured data, where faulty standardization and data processing pipelines can lead to erroneous features. In this work, we explore using the principles of adversarial learning, where the information from several discriminators trained to distinguish between two distributions is used to both detect the corrupted features and fix them in order to remove the distribution shift between datasets.
arXiv Detail & Related papers (2023-12-07T18:58:40Z)
Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data. The relationship between quantification and other types of dataset shift remains, by and large, unexplored. We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
arXiv Detail & Related papers (2023-10-06T20:11:27Z)
Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift [22.708984813519155]
We study the general problem of efficiently estimating target population risk under various dataset shift conditions. We develop efficient and multiply robust estimators along with a straightforward specification test. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift.
arXiv Detail & Related papers (2023-06-28T17:54:18Z)
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z)
Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation [85.13934713535527]
Distribution shift is a major source of failure for machine learning models. We introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances that exhibit the desired shift. We demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts.
arXiv Detail & Related papers (2023-02-15T18:56:26Z)
Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features [119.22672589020394]
We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features. Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$2$ improves performance by 5-15% when given limited target data.
arXiv Detail & Related papers (2023-02-10T18:58:03Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Robust Classification under Class-Dependent Domain Shift [29.54336432319199]
In this paper we explore a special type of dataset shift which we call class-dependent domain shift. It is characterized by the following features: the input data causally depends on the label, the shift in the data is fully explained by a known variable, the variable which controls the shift can depend on the label, there is no shift in the label distribution.
arXiv Detail & Related papers (2020-07-10T12:26:57Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.