Understanding and Testing Generalization of Deep Networks on
Out-of-Distribution Data
- URL: http://arxiv.org/abs/2111.09190v2
- Date: Fri, 19 Nov 2021 05:18:13 GMT
- Title: Understanding and Testing Generalization of Deep Networks on
Out-of-Distribution Data
- Authors: Rui Hu, Jitao Sang, Jinqiang Wang, Rui Hu, Chaoquan Jiang
- Abstract summary: Deep network models perform excellently on In-Distribution data, but can significantly fail on Out-Of-Distribution data.
This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm.
- Score: 30.471871571256198
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep network models perform excellently on In-Distribution (ID) data, but can
significantly fail on Out-Of-Distribution (OOD) data. While developing methods
focus on improving OOD generalization, few attention has been paid to
evaluating the capability of models to handle OOD data. This study is devoted
to analyzing the problem of experimental ID test and designing OOD test
paradigm to accurately evaluate the practical performance. Our analysis is
based on an introduced categorization of three types of distribution shifts to
generate OOD data. Main observations include: (1) ID test fails in neither
reflecting the actual performance of a single model nor comparing between
different models under OOD data. (2) The ID test failure can be ascribed to the
learned marginal and conditional spurious correlations resulted from the
corresponding distribution shifts. Based on this, we propose novel OOD test
paradigms to evaluate the generalization capacity of models to unseen data, and
discuss how to use OOD test results to find bugs of models to guide model
debugging.
Related papers
- Out-of-Distribution Learning with Human Feedback [26.398598663165636]
This paper presents a novel framework for OOD learning with human feedback.
Our framework capitalizes on the freely available unlabeled data in the wild.
By exploiting human feedback, we enhance the robustness and reliability of machine learning models.
arXiv Detail & Related papers (2024-08-14T18:49:27Z) - Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution [38.844580833635725]
We present a training-time regularization technique to mitigate the bias and boost imbalanced OOD detectors across architecture designs.
Our method translates into consistent improvements on the representative CIFAR10-LT, CIFAR100-LT, and ImageNet-LT benchmarks.
arXiv Detail & Related papers (2024-07-23T12:28:59Z) - A Survey on Evaluation of Out-of-Distribution Generalization [41.39827887375374]
Out-of-Distribution (OOD) generalization is a complex and fundamental problem.
This paper serves as the first effort to conduct a comprehensive review of OOD evaluation.
We categorize existing research into three paradigms: OOD performance testing, OOD performance prediction, and OOD intrinsic property characterization.
arXiv Detail & Related papers (2024-03-04T09:30:35Z) - Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP.
We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.
We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z) - Out-of-distribution Detection with Implicit Outlier Transformation [72.73711947366377]
Outlier exposure (OE) is powerful in out-of-distribution (OOD) detection.
We propose a novel OE-based approach that makes the model perform well for unseen OOD situations.
arXiv Detail & Related papers (2023-03-09T04:36:38Z) - Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric
Perspective [55.45202687256175]
Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD.
In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection.
We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance.
arXiv Detail & Related papers (2023-02-16T13:34:35Z) - Towards Realistic Out-of-Distribution Detection: A Novel Evaluation
Framework for Improving Generalization in OOD Detection [14.541761912174799]
This paper presents a novel evaluation framework for Out-of-Distribution (OOD) detection.
It aims to assess the performance of machine learning models in more realistic settings.
arXiv Detail & Related papers (2022-11-20T07:30:15Z) - Are Sample-Efficient NLP Models More Robust? [90.54786862811183]
We investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation)
We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others.
These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent.
arXiv Detail & Related papers (2022-10-12T17:54:59Z) - Towards Robust Visual Question Answering: Making the Most of Biased
Samples via Contrastive Learning [54.61762276179205]
We propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples.
Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples.
We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
arXiv Detail & Related papers (2022-10-10T11:05:21Z) - Training OOD Detectors in their Natural Habitats [31.565635192716712]
Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild.
Recent methods use auxiliary outlier data to regularize the model for improved OOD detection.
We propose a novel framework that leverages wild mixture data -- that naturally consists of both ID and OOD samples.
arXiv Detail & Related papers (2022-02-07T15:38:39Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.