Benign Overfitting in Out-of-Distribution Generalization of Linear Models
- URL: http://arxiv.org/abs/2412.14474v1
- Date: Thu, 19 Dec 2024 02:47:39 GMT
- Title: Benign Overfitting in Out-of-Distribution Generalization of Linear Models
- Authors: Shange Tang, Jiayun Wu, Jianqing Fan, Chi Jin,
- Abstract summary: We take an initial step towards understanding benign overfitting in the Out-of-Distribution (OOD) regime.
We provide non-asymptotic guarantees proving that benign overfitting occurs in standard ridge regression.
We also present theoretical results for a more general family of target covariance matrix.
- Score: 19.203753135860016
- License:
- Abstract: Benign overfitting refers to the phenomenon where an over-parameterized model fits the training data perfectly, including noise in the data, but still generalizes well to the unseen test data. While prior work provides some theoretical understanding of this phenomenon under the in-distribution setup, modern machine learning often operates in a more challenging Out-of-Distribution (OOD) regime, where the target (test) distribution can be rather different from the source (training) distribution. In this work, we take an initial step towards understanding benign overfitting in the OOD regime by focusing on the basic setup of over-parameterized linear models under covariate shift. We provide non-asymptotic guarantees proving that benign overfitting occurs in standard ridge regression, even under the OOD regime when the target covariance satisfies certain structural conditions. We identify several vital quantities relating to source and target covariance, which govern the performance of OOD generalization. Our result is sharp, which provably recovers prior in-distribution benign overfitting guarantee [Tsigler and Bartlett, 2023], as well as under-parameterized OOD guarantee [Ge et al., 2024] when specializing to each setup. Moreover, we also present theoretical results for a more general family of target covariance matrix, where standard ridge regression only achieves a slow statistical rate of $O(1/\sqrt{n})$ for the excess risk, while Principal Component Regression (PCR) is guaranteed to achieve the fast rate $O(1/n)$, where $n$ is the number of samples.
Related papers
- MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts [25.643876327918544]
Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution samples.
Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias.
We propose MaNo which applies a data-dependent normalization on the logits to reduce prediction bias and takes the $L_p$ norm of the matrix of normalized logits as the estimation score.
arXiv Detail & Related papers (2024-05-29T10:45:06Z) - On the Benefits of Over-parameterization for Out-of-Distribution Generalization [28.961538657831788]
We investigate the performance of a machine learning model in terms of Out-of-Distribution (OOD) loss under benign overfitting conditions.
We show that further increasing the model's parameterization can significantly reduce the OOD loss.
These insights explain the empirical phenomenon of enhanced OOD generalization through model ensembles.
arXiv Detail & Related papers (2024-03-26T11:01:53Z) - Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression [29.57766164934947]
We investigate popular resampling methods for estimating the uncertainty of statistical models.
We provide a tight description of the biases and variances estimated by these methods in the context of generalized linear models.
arXiv Detail & Related papers (2024-02-21T08:50:33Z) - Robust Linear Regression: Phase-Transitions and Precise Tradeoffs for
General Norms [29.936005822346054]
We investigate the impact of test-time adversarial attacks on linear regression models.
We determine the optimal level of robustness that any model can reach while maintaining a given level of standard predictive performance (accuracy)
We obtain a precise characterization which distinguishes between regimes where robustness is achievable without hurting standard accuracy and regimes where a tradeoff might be unavoidable.
arXiv Detail & Related papers (2023-08-01T13:55:45Z) - A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown.
We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$.
We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z) - Exploring Optimal Substructure for Out-of-distribution Generalization
via Feature-targeted Model Pruning [23.938392334438582]
We propose a novel Spurious Feature-targeted model Pruning framework, dubbed SFP, to automatically explore invariant substructures.
SFP can significantly outperform both structure-based and non-structure OOD generalization SOTAs, with accuracy improvement up to 4.72% and 23.35%, respectively.
arXiv Detail & Related papers (2022-12-19T13:51:06Z) - Regularization Guarantees Generalization in Bayesian Reinforcement
Learning through Algorithmic Stability [48.62272919754204]
We study generalization in Bayesian RL under the probably approximately correct (PAC) framework.
Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense.
arXiv Detail & Related papers (2021-09-24T07:48:34Z) - Posterior Differential Regularization with f-divergence for Improving
Model Robustness [95.05725916287376]
We focus on methods that regularize the model posterior difference between clean and noisy inputs.
We generalize the posterior differential regularization to the family of $f$-divergences.
Our experiments show that regularizing the posterior differential with $f$-divergence can result in well-improved model robustness.
arXiv Detail & Related papers (2020-10-23T19:58:01Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.