Improving Generalization via Uncertainty Driven Perturbations
- URL: http://arxiv.org/abs/2202.05737v1
- Date: Fri, 11 Feb 2022 16:22:08 GMT
- Title: Improving Generalization via Uncertainty Driven Perturbations
- Authors: Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan,
Tatjana Chavdarova
- Abstract summary: We consider uncertainty-driven perturbations of the training data points.
Unlike loss-driven perturbations, uncertainty-guided perturbations do not cross the decision boundary.
We show that UDP is guaranteed to achieve the robustness margin decision on linear models.
- Score: 107.45752065285821
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently Shah et al., 2020 pointed out the pitfalls of the simplicity bias -
the tendency of gradient-based algorithms to learn simple models - which
include the model's high sensitivity to small input perturbations, as well as
sub-optimal margins. In particular, while Stochastic Gradient Descent yields
max-margin boundary on linear models, such guarantee does not extend to
non-linear models. To mitigate the simplicity bias, we consider
uncertainty-driven perturbations (UDP) of the training data points, obtained
iteratively by following the direction that maximizes the model's estimated
uncertainty. Unlike loss-driven perturbations, uncertainty-guided perturbations
do not cross the decision boundary, allowing for using a larger range of values
for the hyperparameter that controls the magnitude of the perturbation.
Moreover, as real-world datasets have non-isotropic distances between data
points of different classes, the above property is particularly appealing for
increasing the margin of the decision boundary, which in turn improves the
model's generalization. We show that UDP is guaranteed to achieve the maximum
margin decision boundary on linear models and that it notably increases it on
challenging simulated datasets. Interestingly, it also achieves competitive
loss-based robustness and generalization trade-off on several datasets.
Related papers
- Parameter uncertainties for imperfect surrogate models in the low-noise regime [0.3069335774032178]
We analyze the generalization error of misspecified, near-deterministic surrogate models.
We show posterior distributions must cover every training point to avoid a divergent generalization error.
This is demonstrated on model problems before application to thousand dimensional datasets in atomistic machine learning.
arXiv Detail & Related papers (2024-02-02T11:41:21Z) - Scalable Higher-Order Tensor Product Spline Models [0.0]
We propose a new approach using a factorization method to derive a highly scalable higher-order tensor product spline model.
Our method allows for the incorporation of all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions.
arXiv Detail & Related papers (2024-02-02T01:18:48Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z) - A Priori Denoising Strategies for Sparse Identification of Nonlinear
Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements.
We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - The Sobolev Regularization Effect of Stochastic Gradient Descent [8.193914488276468]
We show that flat minima regularize the gradient of the model function, which explains the good performance of flat minima.
We also consider high-order moments of gradient noise, and show that Gradient Dascent (SGD) tends to impose constraints on these moments by a linear analysis of SGD around global minima.
arXiv Detail & Related papers (2021-05-27T21:49:21Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - On the Stability Properties and the Optimization Landscape of Training
Problems with Squared Loss for Neural Networks and General Nonlinear Conic
Approximation Schemes [0.0]
We study the optimization landscape and the stability properties of training problems with squared loss for neural networks and general nonlinear conic approximation schemes.
We prove that the same effects that are responsible for these instability properties are also the reason for the emergence of saddle points and spurious local minima.
arXiv Detail & Related papers (2020-11-06T11:34:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.