Related papers: Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation

Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation

URL: http://arxiv.org/abs/2302.07865v2
Date: Mon, 19 Jun 2023 15:41:07 GMT
Title: Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation
Authors: Joshua Vendrow, Saachi Jain, Logan Engstrom, Aleksander Madry
Abstract summary: Distribution shift is a major source of failure for machine learning models. We introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances that exhibit the desired shift. We demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts.
Score: 85.13934713535527
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distribution shift is a major source of failure for machine learning models. However, evaluating model reliability under distribution shift can be challenging, especially since it may be difficult to acquire counterfactual examples that exhibit a specified shift. In this work, we introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances from that input distribution that exhibit the desired shift. We study a number of natural implementations for such an interface, and find that they often introduce confounding shifts that complicate model evaluation. Motivated by this, we propose a dataset interface implementation that leverages Textual Inversion to tailor generation to the input distribution. We then demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts, including variations in background, lighting, and attributes of the objects. Code available at https://github.com/MadryLab/dataset-interfaces.

Related papers

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data. TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
Fairness Hub Technical Briefs: Definition and Detection of Distribution Shift [0.5825410941577593]
Distribution shift is a common situation in machine learning tasks, where the data used for training a model is different from the data the model is applied to in the real world. This brief focuses on the definition and detection of distribution shifts in educational settings.
arXiv Detail & Related papers (2024-05-23T05:29:36Z)
Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data. The relationship between quantification and other types of dataset shift remains, by and large, unexplored. We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
arXiv Detail & Related papers (2023-10-06T20:11:27Z)
Explanation Shift: How Did the Distribution Shift Impact the Model? [23.403838118256907]
We study how explanation characteristics shift when affected by distribution shifts. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
arXiv Detail & Related papers (2023-03-14T17:13:01Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Time-Varying Propensity Score to Bridge the Gap between the Past and Present [104.46387765330142]
We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data. We demonstrate different ways of implementing it and evaluate it on a variety of problems.
arXiv Detail & Related papers (2022-10-04T07:21:49Z)
Estimating and Explaining Model Performance When Both Covariates and Labels Shift [36.94826820536239]
We propose a new distribution shift model, Sparse Joint Shift (SJS), which considers the joint shift of both labels and a few features. We also propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels.
arXiv Detail & Related papers (2022-09-18T01:16:16Z)
Identifying the Context Shift between Test Benchmarks and Production Data [1.2259552039796024]
There exists a performance gap between machine learning models' accuracy on dataset benchmarks and real-world production data. We outline two methods for identifying changes in context that lead to distribution shifts and model prediction errors. We present two case-studies to highlight the implicit assumptions underlying applied machine learning models that tend to lead to errors.
arXiv Detail & Related papers (2022-07-03T14:54:54Z)
AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection [7.829710051617368]
We introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning.
arXiv Detail & Related papers (2022-06-30T17:59:22Z)
CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance. In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z)
BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift. We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions. Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.