Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families
- URL: http://arxiv.org/abs/2305.06625v2
- Date: Sun, 28 Jul 2024 13:20:03 GMT
- Title: Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families
- Authors: Benedikt Lütke Schwienhorst, Lucas Kock, Nadja Klein, David J. Nott,
- Abstract summary: We study dropout regularization in extended generalized linear models based on double exponential families.
A theoretical analysis shows that dropout regularization prefers rare but important features in both the mean and dispersion.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even though dropout is a popular regularization technique, its theoretical properties are not fully understood. In this paper we study dropout regularization in extended generalized linear models based on double exponential families, for which the dispersion parameter can vary with the features. A theoretical analysis shows that dropout regularization prefers rare but important features in both the mean and dispersion, generalizing an earlier result for conventional generalized linear models. To illustrate, we apply dropout to adaptive smoothing with B-splines, where both the mean and dispersion parameters are modeled flexibly. The important B-spline basis functions can be thought of as rare features, and we confirm in experiments that dropout is an effective form of regularization for mean and dispersion parameters that improves on a penalized maximum likelihood approach with an explicit smoothness penalty. An application to traffic detection data from Berlin further illustrates the benefits of our method.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Amortized backward variational inference in nonlinear state-space models [0.0]
We consider the problem of state estimation in general state-space models using variational inference.
We establish for the first time that, under mixing assumptions, the variational approximation of expectations of additive state functionals induces an error which grows at most linearly in the number of observations.
arXiv Detail & Related papers (2022-06-01T08:35:54Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Estimating Linear Mixed Effects Models with Truncated Normally
Distributed Random Effects [5.4052819252055055]
Inference can be conducted using maximum likelihood approach if assuming Normal distributions on the random effects.
In this paper we extend the classical (unconstrained) LME models to allow for sign constraints on its overall coefficients.
arXiv Detail & Related papers (2020-11-09T16:17:35Z) - Flexible mean field variational inference using mixtures of
non-overlapping exponential families [6.599344783327053]
I show that using standard mean field variational inference can fail to produce sensible results for models with sparsity-inducing priors.
I show that any mixture of a diffuse exponential family and a point mass at zero to model sparsity forms an exponential family.
arXiv Detail & Related papers (2020-10-14T01:46:56Z) - Multiple Descent: Design Your Own Generalization Curve [46.47831396167738]
We show that the generalization curve can have an arbitrary number of peaks, and moreover, locations of those peaks can be explicitly controlled.
Our results highlight the fact that both classical U-shaped generalization curve and the recently observed double descent curve are not intrinsic properties of the model family.
arXiv Detail & Related papers (2020-08-03T17:22:21Z) - Robust model training and generalisation with Studentising flows [22.757298187704745]
We discuss how these methods can be further improved based on insights from robust (in particular, resistant) statistics.
We propose to endow flow-based models with fat-tailed latent distributions as a simple drop-in replacement for the Gaussian distribution.
Experiments on several different datasets confirm the efficacy of the proposed approach.
arXiv Detail & Related papers (2020-06-11T16:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.