Wild-Tab: A Benchmark For Out-Of-Distribution Generalization In Tabular
Regression
- URL: http://arxiv.org/abs/2312.01792v1
- Date: Mon, 4 Dec 2023 10:27:38 GMT
- Title: Wild-Tab: A Benchmark For Out-Of-Distribution Generalization In Tabular
Regression
- Authors: Sergey Kolesnikov
- Abstract summary: Out-of-Distribution (OOD) generalization is an ongoing challenge in deep learning.
We present Wild-Tab, a benchmark tailored for OOD generalization in tabular regression tasks.
The benchmark incorporates 3 industrial datasets sourced from fields like weather prediction and power consumption estimation.
We observe that many of these methods often struggle to maintain high-performance levels on unseen data, with OOD performance showing a marked drop compared to in-distribution performance.
- Score: 4.532517021515834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Out-of-Distribution (OOD) generalization, a cornerstone for building robust
machine learning models capable of handling data diverging from the training
set's distribution, is an ongoing challenge in deep learning. While significant
progress has been observed in computer vision and natural language processing,
its exploration in tabular data, ubiquitous in many industrial applications,
remains nascent. To bridge this gap, we present Wild-Tab, a large-scale
benchmark tailored for OOD generalization in tabular regression tasks. The
benchmark incorporates 3 industrial datasets sourced from fields like weather
prediction and power consumption estimation, providing a challenging testbed
for evaluating OOD performance under real-world conditions. Our extensive
experiments, evaluating 10 distinct OOD generalization methods on Wild-Tab,
reveal nuanced insights. We observe that many of these methods often struggle
to maintain high-performance levels on unseen data, with OOD performance
showing a marked drop compared to in-distribution performance. At the same
time, Empirical Risk Minimization (ERM), despite its simplicity, delivers
robust performance across all evaluations, rivaling the results of
state-of-the-art methods. Looking forward, we hope that the release of Wild-Tab
will facilitate further research on OOD generalization and aid in the
deployment of machine learning models in various real-world contexts where
handling distribution shifts is a crucial requirement.
Related papers
- Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution [66.11004226578771]
Existing robust benchmark datasets have two key limitations.
They generate only a limited range of perturbations for a single Information Extraction (IE) task.
Considering the powerful generation capabilities of Large Language Models (LLMs), we introduce a new benchmark dataset for Robust UIE, called RUIE-Bench.
We show that training with only textbf15% of the data leads to an average textbf7.5% relative performance improvement across three IE tasks.
arXiv Detail & Related papers (2025-03-05T05:39:29Z) - Out-of-Distribution Learning with Human Feedback [26.398598663165636]
This paper presents a novel framework for OOD learning with human feedback.
Our framework capitalizes on the freely available unlabeled data in the wild.
By exploiting human feedback, we enhance the robustness and reliability of machine learning models.
arXiv Detail & Related papers (2024-08-14T18:49:27Z) - Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models.
This synthetic data is employed to evaluate the robustness of pretrained segmenters.
We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z) - EAT: Towards Long-Tailed Out-of-Distribution Detection [55.380390767978554]
This paper addresses the challenging task of long-tailed OOD detection.
The main difficulty lies in distinguishing OOD data from samples belonging to the tail classes.
We propose two simple ideas: (1) Expanding the in-distribution class space by introducing multiple abstention classes, and (2) Augmenting the context-limited tail classes by overlaying images onto the context-rich OOD data.
arXiv Detail & Related papers (2023-12-14T13:47:13Z) - DIVERSIFY: A General Framework for Time Series Out-of-distribution
Detection and Generalization [58.704753031608625]
Time series is one of the most challenging modalities in machine learning research.
OOD detection and generalization on time series tend to suffer due to its non-stationary property.
We propose DIVERSIFY, a framework for OOD detection and generalization on dynamic distributions of time series.
arXiv Detail & Related papers (2023-08-04T12:27:11Z) - Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP.
We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.
We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Pseudo-OOD training for robust language models [78.15712542481859]
OOD detection is a key component of a reliable machine-learning model for any industry-scale application.
We propose POORE - POsthoc pseudo-Ood REgularization, that generates pseudo-OOD samples using in-distribution (IND) data.
We extensively evaluate our framework on three real-world dialogue systems, achieving new state-of-the-art in OOD detection.
arXiv Detail & Related papers (2022-10-17T14:32:02Z) - WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series [9.181035389003759]
We present WOODS: eight challenging open-source time series benchmarks covering a diverse range of data modalities.
We revise the existing OOD generalization algorithms for time series tasks and evaluate them using our systematic framework.
Our experiments show a large room for improvement for empirical risk minimization and OOD generalization algorithms on our datasets.
arXiv Detail & Related papers (2022-03-18T14:12:54Z) - Training OOD Detectors in their Natural Habitats [31.565635192716712]
Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild.
Recent methods use auxiliary outlier data to regularize the model for improved OOD detection.
We propose a novel framework that leverages wild mixture data -- that naturally consists of both ID and OOD samples.
arXiv Detail & Related papers (2022-02-07T15:38:39Z) - BEDS-Bench: Behavior of EHR-models under Distributional Shift--A
Benchmark [21.040754460129854]
We release BEDS-Bench, a benchmark for quantifying the behavior of ML models over EHR data under OOD settings.
We evaluate several learning algorithms under BEDS-Bench and find that all of them show poor generalization performance under distributional shift in general.
arXiv Detail & Related papers (2021-07-17T05:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.