A Survey on Evaluation of Out-of-Distribution Generalization
- URL: http://arxiv.org/abs/2403.01874v1
- Date: Mon, 4 Mar 2024 09:30:35 GMT
- Title: A Survey on Evaluation of Out-of-Distribution Generalization
- Authors: Han Yu, Jiashuo Liu, Xingxuan Zhang, Jiayun Wu, Peng Cui
- Abstract summary: Out-of-Distribution (OOD) generalization is a complex and fundamental problem.
This paper serves as the first effort to conduct a comprehensive review of OOD evaluation.
We categorize existing research into three paradigms: OOD performance testing, OOD performance prediction, and OOD intrinsic property characterization.
- Score: 41.39827887375374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models, while progressively advanced, rely heavily on the
IID assumption, which is often unfulfilled in practice due to inevitable
distribution shifts. This renders them susceptible and untrustworthy for
deployment in risk-sensitive applications. Such a significant problem has
consequently spawned various branches of works dedicated to developing
algorithms capable of Out-of-Distribution (OOD) generalization. Despite these
efforts, much less attention has been paid to the evaluation of OOD
generalization, which is also a complex and fundamental problem. Its goal is
not only to assess whether a model's OOD generalization capability is strong or
not, but also to evaluate where a model generalizes well or poorly. This
entails characterizing the types of distribution shifts that a model can
effectively address, and identifying the safe and risky input regions given a
model. This paper serves as the first effort to conduct a comprehensive review
of OOD evaluation. We categorize existing research into three paradigms: OOD
performance testing, OOD performance prediction, and OOD intrinsic property
characterization, according to the availability of test data. Additionally, we
briefly discuss OOD evaluation in the context of pretrained models. In closing,
we propose several promising directions for future research in OOD evaluation.
Related papers
- The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection [75.65876949930258]
Out-of-distribution (OOD) detection is essential for model trustworthiness.
We show that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability.
arXiv Detail & Related papers (2024-10-12T07:02:04Z) - Out-of-Distribution Learning with Human Feedback [26.398598663165636]
This paper presents a novel framework for OOD learning with human feedback.
Our framework capitalizes on the freely available unlabeled data in the wild.
By exploiting human feedback, we enhance the robustness and reliability of machine learning models.
arXiv Detail & Related papers (2024-08-14T18:49:27Z) - Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is
All You Need [52.88953913542445]
We find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly.
We take Masked Image Modeling as a pretext task for our OOD detection framework (MOOD)
arXiv Detail & Related papers (2023-02-06T08:24:41Z) - Towards Realistic Out-of-Distribution Detection: A Novel Evaluation
Framework for Improving Generalization in OOD Detection [14.541761912174799]
This paper presents a novel evaluation framework for Out-of-Distribution (OOD) detection.
It aims to assess the performance of machine learning models in more realistic settings.
arXiv Detail & Related papers (2022-11-20T07:30:15Z) - Pseudo-OOD training for robust language models [78.15712542481859]
OOD detection is a key component of a reliable machine-learning model for any industry-scale application.
We propose POORE - POsthoc pseudo-Ood REgularization, that generates pseudo-OOD samples using in-distribution (IND) data.
We extensively evaluate our framework on three real-world dialogue systems, achieving new state-of-the-art in OOD detection.
arXiv Detail & Related papers (2022-10-17T14:32:02Z) - Understanding and Testing Generalization of Deep Networks on
Out-of-Distribution Data [30.471871571256198]
Deep network models perform excellently on In-Distribution data, but can significantly fail on Out-Of-Distribution data.
This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm.
arXiv Detail & Related papers (2021-11-17T15:29:07Z) - Improved OOD Generalization via Adversarial Training and Pre-training [49.08683910076778]
In this paper, we theoretically show that a model robust to input perturbations generalizes well on OOD data.
Inspired by previous findings that adversarial training helps improve input-robustness, we show that adversarially trained models have converged excess risk on OOD data.
arXiv Detail & Related papers (2021-05-24T08:06:35Z) - ATOM: Robustifying Out-of-distribution Detection Using Outlier Mining [51.19164318924997]
Adrial Training with informative Outlier Mining improves robustness of OOD detection.
ATOM achieves state-of-the-art performance under a broad family of classic and adversarial OOD evaluation tasks.
arXiv Detail & Related papers (2020-06-26T20:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.