Towards a Theoretical Framework of Out-of-Distribution Generalization
- URL: http://arxiv.org/abs/2106.04496v1
- Date: Tue, 8 Jun 2021 16:32:23 GMT
- Title: Towards a Theoretical Framework of Out-of-Distribution Generalization
- Authors: Haotian Ye, Chuanlong Xie, Tianle Cai, Ruichen Li, Zhenguo Li, Liwei
Wang
- Abstract summary: Generalization to out-of-distribution (OOD) data, or domain generalization, is one of the central problems in modern machine learning.
In this work, we take the first step towards rigorous and quantitative definitions of what is OOD; and what does it mean by saying an OOD problem is learnable.
- Score: 28.490842160921805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization to out-of-distribution (OOD) data, or domain generalization,
is one of the central problems in modern machine learning. Recently, there is a
surge of attempts to propose algorithms for OOD that mainly build upon the idea
of extracting invariant features. Although intuitively reasonable, theoretical
understanding of what kind of invariance can guarantee OOD generalization is
still limited, and generalization to arbitrary out-of-distribution is clearly
impossible. In this work, we take the first step towards rigorous and
quantitative definitions of 1) what is OOD; and 2) what does it mean by saying
an OOD problem is learnable. We also introduce a new concept of expansion
function, which characterizes to what extent the variance is amplified in the
test domains over the training domains, and therefore give a quantitative
meaning of invariant features. Based on these, we prove OOD generalization
error bounds. It turns out that OOD generalization largely depends on the
expansion function. As recently pointed out by Gulrajani and Lopez-Paz (2020),
any OOD learning algorithm without a model selection module is incomplete. Our
theory naturally induces a model selection criterion. Extensive experiments on
benchmark OOD datasets demonstrate that our model selection criterion has a
significant advantage over baselines.
Related papers
- The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection [75.65876949930258]
Out-of-distribution (OOD) detection is essential for model trustworthiness.
We show that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability.
arXiv Detail & Related papers (2024-10-12T07:02:04Z) - A Practical Theory of Generalization in Selectivity Learning [8.268822578361824]
query-driven machine learning models have emerged as a promising estimation technique for query selectivities.
We bridge the gaps between state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework.
We show that selectivity predictors induced by signed measures are learnable, which relaxes the reliance on probability measures in SOTA theory.
arXiv Detail & Related papers (2024-09-11T05:10:32Z) - On the Benefits of Over-parameterization for Out-of-Distribution Generalization [28.961538657831788]
We investigate the performance of a machine learning model in terms of Out-of-Distribution (OOD) loss under benign overfitting conditions.
We show that further increasing the model's parameterization can significantly reduce the OOD loss.
These insights explain the empirical phenomenon of enhanced OOD generalization through model ensembles.
arXiv Detail & Related papers (2024-03-26T11:01:53Z) - Towards Robust Out-of-Distribution Generalization Bounds via Sharpness [41.65692353665847]
We study the effect of sharpness on how a model tolerates data change in domain shift.
We propose a sharpness-based OOD generalization bound by taking robustness into consideration.
arXiv Detail & Related papers (2024-03-11T02:57:27Z) - Invariant Random Forest: Tree-Based Model Solution for OOD
Generalization [13.259844672078552]
This paper introduces a novel and effective solution for OOD generalization of decision tree models, named Invariant Decision Tree (IDT)
IDT enforces a penalty term with regard to the unstable/varying behavior of a split across different environments during the growth of the tree.
Our proposed method is motivated by a theoretical result under mild conditions, and validated by numerical tests with both synthetic and real datasets.
arXiv Detail & Related papers (2023-12-07T12:53:05Z) - DIVERSIFY: A General Framework for Time Series Out-of-distribution
Detection and Generalization [58.704753031608625]
Time series is one of the most challenging modalities in machine learning research.
OOD detection and generalization on time series tend to suffer due to its non-stationary property.
We propose DIVERSIFY, a framework for OOD detection and generalization on dynamic distributions of time series.
arXiv Detail & Related papers (2023-08-04T12:27:11Z) - RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection [65.67315418971688]
textttRankFeat is a simple yet effective textttpost hoc approach for OOD detection.
textttRankFeat achieves the emphstate-of-the-art performance and reduces the average false positive rate (FPR95) by 17.90% compared with the previous best method.
arXiv Detail & Related papers (2022-09-18T16:01:31Z) - Can Subnetwork Structure be the Key to Out-of-Distribution
Generalization? [21.037720934987487]
In this paper, we use a functional modular probing method to analyze deep model structures under OOD setting.
We demonstrate that even in biased models (which focus on spurious correlation) there still exist unbiased functionalworks.
arXiv Detail & Related papers (2021-06-05T13:19:27Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z) - Learning Causal Semantic Representation for Out-of-Distribution
Prediction [125.38836464226092]
We propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately.
We show that CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error.
arXiv Detail & Related papers (2020-11-03T13:16:05Z) - On the Value of Out-of-Distribution Testing: An Example of Goodhart's
Law [78.10523907729642]
VQA-CP has become the standard OOD benchmark for visual question answering.
Most published methods rely on explicit knowledge of the construction of the OOD splits.
We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types.
arXiv Detail & Related papers (2020-05-19T06:45:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.