A PAC-Bayes oracle inequality for sparse neural networks
- URL: http://arxiv.org/abs/2204.12392v2
- Date: Mon, 2 Oct 2023 10:51:23 GMT
- Title: A PAC-Bayes oracle inequality for sparse neural networks
- Authors: Maximilian F. Steffen, Mathias Trabs
- Abstract summary: We study the Gibbs posterior distribution for sparse deep neural nets in a nonparametric regression setting.
We prove an oracle inequality which shows that the method adapts to the unknown regularity and hierarchical structure of the regression function.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the Gibbs posterior distribution for sparse deep neural nets in a
nonparametric regression setting. The posterior can be accessed via
Metropolis-adjusted Langevin algorithms. Using a mixture over uniform priors on
sparse sets of network weights, we prove an oracle inequality which shows that
the method adapts to the unknown regularity and hierarchical structure of the
regression function. The estimator achieves the minimax-optimal rate of
convergence (up to a logarithmic factor).
Related papers
- Deep learning from strongly mixing observations: Sparse-penalized regularization and minimax optimality [0.0]
We consider sparse-penalized regularization for deep neural network predictor.
We deal with the squared and a broad class of loss functions.
arXiv Detail & Related papers (2024-06-12T15:21:51Z) - Posterior and variational inference for deep neural networks with heavy-tailed weights [0.0]
We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random.
We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates.
We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.
arXiv Detail & Related papers (2024-06-05T15:24:20Z) - Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Variational EP with Probabilistic Backpropagation for Bayesian Neural
Networks [0.0]
I propose a novel approach for nonlinear Logistic regression using a two-layer neural network (NN) model structure with hierarchical priors on the network weights.
I derive a computationally efficient algorithm, whose complexity scales similarly to an ensemble of independent sparse logistic models.
arXiv Detail & Related papers (2023-03-02T19:09:47Z) - Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for
Inverse Problem [97.64313409741614]
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators.
We propose to do posterior sampling in the latent space of a pre-trained generative model.
arXiv Detail & Related papers (2022-06-18T03:47:37Z) - Wide Bayesian neural networks have a simple weight posterior: theory and
accelerated sampling [48.94555574632823]
Repriorisation transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow.
We develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN.
We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks.
arXiv Detail & Related papers (2022-06-15T17:11:08Z) - How do noise tails impact on deep ReLU networks? [2.5889847253961418]
We show how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions.
We also contribute some new results on the approximation theory of deep ReLU neural networks.
arXiv Detail & Related papers (2022-03-20T00:27:32Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - The k-tied Normal Distribution: A Compact Parameterization of Gaussian
Mean Field Posteriors in Bayesian Neural Networks [46.677567663908185]
Variational Bayesian Inference is a popular methodology for approxing posteriorimating over Bayesian neural network weights.
Recent work has explored ever richer parameterizations of the approximate posterior in the hope of improving performance.
We find that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance.
arXiv Detail & Related papers (2020-02-07T07:33:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.