On the Benefits of Invariance in Neural Networks
- URL: http://arxiv.org/abs/2005.00178v1
- Date: Fri, 1 May 2020 02:08:58 GMT
- Title: On the Benefits of Invariance in Neural Networks
- Authors: Clare Lyle, Mark van der Wilk, Marta Kwiatkowska, Yarin Gal, Benjamin
Bloem-Reddy
- Abstract summary: We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
- Score: 56.362579457990094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real world data analysis problems exhibit invariant structure, and
models that take advantage of this structure have shown impressive empirical
performance, particularly in deep learning. While the literature contains a
variety of methods to incorporate invariance into models, theoretical
understanding is poor and there is no way to assess when one method should be
preferred over another. In this work, we analyze the benefits and limitations
of two widely used approaches in deep learning in the presence of invariance:
data augmentation and feature averaging. We prove that training with data
augmentation leads to better estimates of risk and gradients thereof, and we
provide a PAC-Bayes generalization bound for models trained with data
augmentation. We also show that compared to data augmentation, feature
averaging reduces generalization error when used with convex losses, and
tightens PAC-Bayes bounds. We provide empirical support of these theoretical
results, including a demonstration of why generalization may not improve by
training with data augmentation: the `learned invariance' fails outside of the
training distribution.
Related papers
- A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - FedGen: Generalizable Federated Learning for Sequential Data [8.784435748969806]
In many real-world distributed settings, spurious correlations exist due to biases and data sampling issues.
We present a generalizable federated learning framework called FedGen, which allows clients to identify and distinguish between spurious and invariant features.
We show that FedGen results in models that achieve significantly better generalization and can outperform the accuracy of current federated learning approaches by over 24%.
arXiv Detail & Related papers (2022-11-03T15:48:14Z) - Deep Stable Learning for Out-Of-Distribution Generalization [27.437046504902938]
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution.
Eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models.
We propose to address this problem by removing the dependencies between features via learning weights for training samples.
arXiv Detail & Related papers (2021-04-16T03:54:21Z) - Linear Regression with Distributed Learning: A Generalization Error
Perspective [0.0]
We investigate the performance of distributed learning for large-scale linear regression.
We focus on the generalization error, i.e., the performance on unseen data.
Our results show that the generalization error of the distributed solution can be substantially higher than that of the centralized solution.
arXiv Detail & Related papers (2021-01-22T08:43:28Z) - Removing Undesirable Feature Contributions Using Out-of-Distribution
Data [20.437871747430826]
We propose a data augmentation method to improve generalization in both adversarial and standard learning.
The proposed method can further improve the existing state-of-the-art adversarial training.
arXiv Detail & Related papers (2021-01-17T10:26:34Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.