Ensembling Shift Detectors: an Extensive Empirical Evaluation
- URL: http://arxiv.org/abs/2106.14608v1
- Date: Mon, 28 Jun 2021 12:21:16 GMT
- Title: Ensembling Shift Detectors: an Extensive Empirical Evaluation
- Authors: Simona Maggio and L\'eo Dreyfus-Schmidt
- Abstract summary: The term dataset shift refers to the situation where the data used to train a machine learning model is different from where the model operates.
We propose a simple yet powerful technique to ensemble complementary shift detectors, while tuning the significance level of each detector's statistical test to the dataset.
- Score: 0.2538209532048867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The term dataset shift refers to the situation where the data used to train a
machine learning model is different from where the model operates. While
several types of shifts naturally occur, existing shift detectors are usually
designed to address only a specific type of shift. We propose a simple yet
powerful technique to ensemble complementary shift detectors, while tuning the
significance level of each detector's statistical test to the dataset. This
enables a more robust shift detection, capable of addressing all different
types of shift, which is essential in real-life settings where the precise
shift type is often unknown. This approach is validated by a large-scale
statistically sound benchmark study over various synthetic shifts applied to
real-world structured datasets.
Related papers
- Automatic dataset shift identification to support root cause analysis of AI performance drift [13.996602963045387]
Shifts in data distribution can substantially harm the performance of clinical AI models.
We propose the first unsupervised dataset shift identification framework.
We report promising results for the proposed framework on five types of real-world dataset shifts.
arXiv Detail & Related papers (2024-11-12T17:09:20Z) - Adversarial Learning for Feature Shift Detection and Correction [45.65548560695731]
Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in structured data, where faulty standardization and data processing pipelines can lead to erroneous features.
In this work, we explore using the principles of adversarial learning, where the information from several discriminators trained to distinguish between two distributions is used to both detect the corrupted features and fix them in order to remove the distribution shift between datasets.
arXiv Detail & Related papers (2023-12-07T18:58:40Z) - Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data.
The relationship between quantification and other types of dataset shift remains, by and large, unexplored.
We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
arXiv Detail & Related papers (2023-10-06T20:11:27Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Dataset Interfaces: Diagnosing Model Failures Using Controllable
Counterfactual Generation [85.13934713535527]
Distribution shift is a major source of failure for machine learning models.
We introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances that exhibit the desired shift.
We demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts.
arXiv Detail & Related papers (2023-02-15T18:56:26Z) - A unified framework for dataset shift diagnostics [2.449909275410288]
Supervised learning techniques typically assume training data originates from the target population.
Yet, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors.
We propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts.
arXiv Detail & Related papers (2022-05-17T13:34:45Z) - Towards Data-Efficient Detection Transformers [77.43470797296906]
We show most detection transformers suffer from significant performance drops on small-size datasets.
We empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR.
We introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
arXiv Detail & Related papers (2022-03-17T17:56:34Z) - Exploring Covariate and Concept Shift for Detection and Calibration of
Out-of-Distribution Data [77.27338842609153]
characterization reveals that sensitivity to each type of shift is important to the detection and confidence calibration of OOD data.
We propose a geometrically-inspired method to improve OOD detection under both shifts with only in-distribution data.
We are the first to propose a method that works well across both OOD detection and calibration and under different types of shifts.
arXiv Detail & Related papers (2021-10-28T15:42:55Z) - Detection of Dataset Shifts in Learning-Enabled Cyber-Physical Systems
using Variational Autoencoder for Regression [1.5039745292757671]
We propose an approach to detect the dataset shifts effectively for regression problems.
Our approach is based on the inductive conformal anomaly detection and utilizes a variational autoencoder for regression model.
We demonstrate our approach by using an advanced emergency braking system implemented in an open-source simulator for self-driving cars.
arXiv Detail & Related papers (2021-04-14T03:46:37Z) - Robust Classification under Class-Dependent Domain Shift [29.54336432319199]
In this paper we explore a special type of dataset shift which we call class-dependent domain shift.
It is characterized by the following features: the input data causally depends on the label, the shift in the data is fully explained by a known variable, the variable which controls the shift can depend on the label, there is no shift in the label distribution.
arXiv Detail & Related papers (2020-07-10T12:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.