A PAC-Bayesian Perspective on the Interpolating Information Criterion
- URL: http://arxiv.org/abs/2311.07013v1
- Date: Mon, 13 Nov 2023 01:48:08 GMT
- Title: A PAC-Bayesian Perspective on the Interpolating Information Criterion
- Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta,
Michael W. Mahoney
- Abstract summary: We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
- Score: 54.548058449535155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning is renowned for its theory-practice gap, whereby principled
theory typically fails to provide much beneficial guidance for implementation
in practice. This has been highlighted recently by the benign overfitting
phenomenon: when neural networks become sufficiently large to interpolate the
dataset perfectly, model performance appears to improve with increasing model
size, in apparent contradiction with the well-known bias-variance tradeoff.
While such phenomena have proven challenging to theoretically study for general
models, the recently proposed Interpolating Information Criterion (IIC)
provides a valuable theoretical framework to examine performance for
overparameterized models. Using the IIC, a PAC-Bayes bound is obtained for a
general class of models, characterizing factors which influence generalization
performance in the interpolating regime. From the provided bound, we quantify
how the test error for overparameterized models achieving effectively zero
training error depends on the quality of the implicit regularization imposed by
e.g. the combination of model, optimizer, and parameter-initialization scheme;
the spectrum of the empirical neural tangent kernel; curvature of the loss
landscape; and noise present in the data.
Related papers
- State-observation augmented diffusion model for nonlinear assimilation [6.682908186025083]
We propose a novel data-driven assimilation algorithm based on generative models.
Our State-Observation Augmented Diffusion (SOAD) model is designed to handle nonlinear physical and observational models more effectively.
arXiv Detail & Related papers (2024-07-31T03:47:20Z) - Revisiting Spurious Correlation in Domain Generalization [12.745076668687748]
We build a structural causal model (SCM) to describe the causality within data generation process.
We further conduct a thorough analysis of the mechanisms underlying spurious correlation.
In this regard, we propose to control confounding bias in OOD generalization by introducing a propensity score weighted estimator.
arXiv Detail & Related papers (2024-06-17T13:22:00Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Learning to Refit for Convex Learning Problems [11.464758257681197]
We propose a framework to learn to estimate optimized model parameters for different training sets using neural networks.
We rigorously characterize the power of neural networks to approximate convex problems.
arXiv Detail & Related papers (2021-11-24T15:28:50Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.
CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.