Benchmarking Distribution Shift in Tabular Data with TableShift
- URL: http://arxiv.org/abs/2312.07577v3
- Date: Thu, 8 Feb 2024 21:28:23 GMT
- Title: Benchmarking Distribution Shift in Tabular Data with TableShift
- Authors: Josh Gardner, Zoran Popovic, Ludwig Schmidt
- Abstract summary: TableShift is a distribution shift benchmark for tabular data.
It covers domains including finance, education, public policy, healthcare, and civic participation.
We conduct a large-scale study comparing several state-of-the-art data models alongside robust learning and domain generalization methods.
- Score: 32.071534049494076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robustness to distribution shift has become a growing concern for text and
image models as they transition from research subjects to deployment in the
real world. However, high-quality benchmarks for distribution shift in tabular
machine learning tasks are still lacking despite the widespread real-world use
of tabular data and differences in the models used for tabular data in
comparison to text and images. As a consequence, the robustness of tabular
models to distribution shift is poorly understood. To address this issue, we
introduce TableShift, a distribution shift benchmark for tabular data.
TableShift contains 15 binary classification tasks in total, each with an
associated shift, and includes a diverse set of data sources, prediction
targets, and distribution shifts. The benchmark covers domains including
finance, education, public policy, healthcare, and civic participation, and is
accessible using only a few lines of Python code via the TableShift API. We
conduct a large-scale study comparing several state-of-the-art tabular data
models alongside robust learning and domain generalization methods on the
benchmark tasks. Our study demonstrates (1) a linear trend between
in-distribution (ID) and out-of-distribution (OOD) accuracy; (2) domain
robustness methods can reduce shift gaps but at the cost of reduced ID
accuracy; (3) a strong relationship between shift gap (difference between ID
and OOD performance) and shifts in the label distribution.
The benchmark data, Python package, model implementations, and more
information about TableShift are available at
https://github.com/mlfoundations/tableshift and https://tableshift.org .
Related papers
- TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model.
Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.
TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z) - AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler [29.395855812763617]
We propose AdapTable, a framework for adapting machine learning models to target data without accessing source data.
AdapTable operates in two stages: 1) calibrating model predictions using a shift-aware uncertainty calibrator, and 2) adjusting these predictions to match the target label distribution with a label distribution handler.
Our results demonstrate AdapTable's ability to handle various real-world distribution shifts, achieving up to a 16% improvement on the dataset.
arXiv Detail & Related papers (2024-07-15T15:02:53Z) - TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks [30.922069185335246]
We find two common characteristics of tabular data in typical industrial applications that are underrepresented in the datasets usually used for evaluation in the literature.
A considerable portion of datasets in production settings stem from extensive data acquisition and feature engineering pipelines.
This can have an impact on the absolute and relative number of predictive, uninformative, and correlated features compared to academic datasets.
arXiv Detail & Related papers (2024-06-27T17:55:31Z) - Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing.
We present TP-BERTa, a specifically pre-trained LM for tabular data prediction.
A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z) - Dataset Interfaces: Diagnosing Model Failures Using Controllable
Counterfactual Generation [85.13934713535527]
Distribution shift is a major source of failure for machine learning models.
We introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances that exhibit the desired shift.
We demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts.
arXiv Detail & Related papers (2023-02-15T18:56:26Z) - Estimating and Explaining Model Performance When Both Covariates and
Labels Shift [36.94826820536239]
We propose a new distribution shift model, Sparse Joint Shift (SJS), which considers the joint shift of both labels and a few features.
We also propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels.
arXiv Detail & Related papers (2022-09-18T01:16:16Z) - MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution
Shifts and Training Conflicts [20.09404891618634]
We present MetaShift, a collection of 12,868 sets of natural images across 410 classes.
It provides explicit explanations of what is unique about each of its data sets and a distance score that measures the amount of distribution shift between any two of its data sets.
We show how MetaShift can help to visualize conflicts between data subsets during model training.
arXiv Detail & Related papers (2022-02-14T07:40:03Z) - Extending the WILDS Benchmark for Unsupervised Adaptation [186.90399201508953]
We present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data.
These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities.
We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods.
arXiv Detail & Related papers (2021-12-09T18:32:38Z) - WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild.
We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts.
We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.