Wide stochastic networks: Gaussian limit and PAC-Bayesian training
- URL: http://arxiv.org/abs/2106.09798v1
- Date: Thu, 17 Jun 2021 20:25:38 GMT
- Title: Wide stochastic networks: Gaussian limit and PAC-Bayesian training
- Authors: Eugenio Clerico, George Deligiannidis, Arnaud Doucet
- Abstract summary: We show that an extremely large network is approximated by a Gaussian process, both before and during training.
The explicit evaluation of the output distribution allows for a PAC-Bayesian training procedure that directly optimize the bound.
For a large but finite-width network, we show empirically on MNIST that this training approach can outperform standard PAC-Bayesian methods.
- Score: 21.979820411421827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The limit of infinite width allows for substantial simplifications in the
analytical study of overparameterized neural networks. With a suitable random
initialization, an extremely large network is well approximated by a Gaussian
process, both before and during training. In the present work, we establish a
similar result for a simple stochastic architecture whose parameters are random
variables. The explicit evaluation of the output distribution allows for a
PAC-Bayesian training procedure that directly optimizes the generalization
bound. For a large but finite-width network, we show empirically on MNIST that
this training approach can outperform standard PAC-Bayesian methods.
Related papers
- A Bayesian Take on Gaussian Process Networks [1.7188280334580197]
This work implements Monte Carlo and Markov Chain Monte Carlo methods to sample from the posterior distribution of network structures.
We show that our method outperforms state-of-the-art algorithms in recovering the graphical structure of the network.
arXiv Detail & Related papers (2023-06-20T08:38:31Z) - Joint Bayesian Inference of Graphical Structure and Parameters with a
Single Generative Flow Network [59.79008107609297]
We propose in this paper to approximate the joint posterior over the structure of a Bayesian Network.
We use a single GFlowNet whose sampling policy follows a two-phase process.
Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models.
arXiv Detail & Related papers (2023-05-30T19:16:44Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - A simple approach for quantizing neural networks [7.056222499095849]
We propose a new method for quantizing the weights of a fully trained neural network.
A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization.
The developed method also readily allows the quantization of deep networks by consecutive application to single layers.
arXiv Detail & Related papers (2022-09-07T22:36:56Z) - Demystify Optimization and Generalization of Over-parameterized
PAC-Bayesian Learning [20.295960197612743]
PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution.
We show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression.
We further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network.
arXiv Detail & Related papers (2022-02-04T03:49:11Z) - A General Framework for the Practical Disintegration of PAC-Bayesian
Bounds [2.516393111664279]
We introduce new PAC-Bayesian generalization bounds that have the originality to provide disintegrated bounds.
Our bounds are easily optimizable and can be used to design learning algorithms.
arXiv Detail & Related papers (2021-02-17T09:36:46Z) - Attentive Gaussian processes for probabilistic time-series generation [4.94950858749529]
We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence.
We develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch.
The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution.
arXiv Detail & Related papers (2021-02-10T01:19:15Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z) - AIN: Fast and Accurate Sequence Labeling with Approximate Inference
Network [75.44925576268052]
The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches.
Exact probabilistic inference algorithms are typically applied in training and prediction stages of the CRF model.
We propose to employ a parallelizable approximate variational inference algorithm for the CRF model.
arXiv Detail & Related papers (2020-09-17T12:18:43Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.