Related papers: Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data

Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data

URL: http://arxiv.org/abs/2111.09190v2
Date: Fri, 19 Nov 2021 05:18:13 GMT
Title: Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data
Authors: Rui Hu, Jitao Sang, Jinqiang Wang, Rui Hu, Chaoquan Jiang
Abstract summary: Deep network models perform excellently on In-Distribution data, but can significantly fail on Out-Of-Distribution data. This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm.
Score: 30.471871571256198
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep network models perform excellently on In-Distribution (ID) data, but can significantly fail on Out-Of-Distribution (OOD) data. While developing methods focus on improving OOD generalization, few attention has been paid to evaluating the capability of models to handle OOD data. This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm to accurately evaluate the practical performance. Our analysis is based on an introduced categorization of three types of distribution shifts to generate OOD data. Main observations include: (1) ID test fails in neither reflecting the actual performance of a single model nor comparing between different models under OOD data. (2) The ID test failure can be ascribed to the learned marginal and conditional spurious correlations resulted from the corresponding distribution shifts. Based on this, we propose novel OOD test paradigms to evaluate the generalization capacity of models to unseen data, and discuss how to use OOD test results to find bugs of models to guide model debugging.

Related papers

Towards More Trustworthy Deep Code Models by Enabling Out-of-Distribution Detection [12.141246816152288]
We develop two types of SE-specific OOD detection models, unsupervised and weakly-supervised OOD detection for code. Our proposed methods significantly outperform the baselines in detecting OOD samples from four different scenarios simultaneously and also positively impact a main code understanding task.
arXiv Detail & Related papers (2025-02-26T06:59:53Z)
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks [59.47851630504264]
Free-text explanations are expressive and easy to understand, but many datasets lack annotated explanation data. We fine-tune T5-Large and OLMo-7B models and assess the impact of fine-tuning data quality, the number of fine-tuning samples, and few-shot selection methods. The models are evaluated on 19 diverse OOD datasets across three tasks: natural language inference (NLI), fact-checking, and hallucination detection in abstractive summarization.
arXiv Detail & Related papers (2025-02-07T10:01:32Z)
Out-of-Distribution Learning with Human Feedback [26.398598663165636]
This paper presents a novel framework for OOD learning with human feedback. Our framework capitalizes on the freely available unlabeled data in the wild. By exploiting human feedback, we enhance the robustness and reliability of machine learning models.
arXiv Detail & Related papers (2024-08-14T18:49:27Z)
Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution [38.844580833635725]
We present a training-time regularization technique to mitigate the bias and boost imbalanced OOD detectors across architecture designs. Our method translates into consistent improvements on the representative CIFAR10-LT, CIFAR100-LT, and ImageNet-LT benchmarks.
arXiv Detail & Related papers (2024-07-23T12:28:59Z)
A Survey on Evaluation of Out-of-Distribution Generalization [41.39827887375374]
Out-of-Distribution (OOD) generalization is a complex and fundamental problem. This paper serves as the first effort to conduct a comprehensive review of OOD evaluation. We categorize existing research into three paradigms: OOD performance testing, OOD performance prediction, and OOD intrinsic property characterization.
arXiv Detail & Related papers (2024-03-04T09:30:35Z)
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z)
Out-of-distribution Detection with Implicit Outlier Transformation [72.73711947366377]
Outlier exposure (OE) is powerful in out-of-distribution (OOD) detection. We propose a novel OE-based approach that makes the model perform well for unseen OOD situations.
arXiv Detail & Related papers (2023-03-09T04:36:38Z)
Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric Perspective [55.45202687256175]
Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD. In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection. We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance.
arXiv Detail & Related papers (2023-02-16T13:34:35Z)
Towards Realistic Out-of-Distribution Detection: A Novel Evaluation Framework for Improving Generalization in OOD Detection [14.541761912174799]
This paper presents a novel evaluation framework for Out-of-Distribution (OOD) detection. It aims to assess the performance of machine learning models in more realistic settings.
arXiv Detail & Related papers (2022-11-20T07:30:15Z)
Are Sample-Efficient NLP Models More Robust? [90.54786862811183]
We investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation) We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others. These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent.
arXiv Detail & Related papers (2022-10-12T17:54:59Z)
Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning [54.61762276179205]
We propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples. Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples. We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
arXiv Detail & Related papers (2022-10-10T11:05:21Z)
Training OOD Detectors in their Natural Habitats [31.565635192716712]
Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild. Recent methods use auxiliary outlier data to regularize the model for improved OOD detection. We propose a novel framework that leverages wild mixture data -- that naturally consists of both ID and OOD samples.
arXiv Detail & Related papers (2022-02-07T15:38:39Z)
General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.