Guiding Neural Network Initialization via Marginal Likelihood
Maximization
- URL: http://arxiv.org/abs/2012.09943v1
- Date: Thu, 17 Dec 2020 21:46:09 GMT
- Title: Guiding Neural Network Initialization via Marginal Likelihood
Maximization
- Authors: Anthony S. Tai, Chunfeng Huang
- Abstract summary: We leverage the relationship between neural network and Gaussian process models having corresponding activation and covariance functions to infer the hyper- parameter values.
Our experiment shows that marginal consistency provides recommendations that yield near-optimal prediction performance on MNIST classification task.
- Score: 0.9137554315375919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a simple, data-driven approach to help guide hyperparameter
selection for neural network initialization. We leverage the relationship
between neural network and Gaussian process models having corresponding
activation and covariance functions to infer the hyperparameter values
desirable for model initialization. Our experiment shows that marginal
likelihood maximization provides recommendations that yield near-optimal
prediction performance on MNIST classification task under experiment
constraints. Furthermore, our empirical results indicate consistency in the
proposed technique, suggesting that computation cost for the procedure could be
significantly reduced with smaller training sets.
Related papers
- Imitation Learning of MPC with Neural Networks: Error Guarantees and Sparsification [5.260346080244568]
We present a framework for bounding the approximation error in imitation model predictive controllers utilizing neural networks.
We discuss how this method can be used to design a stable neural network controller with performance guarantees.
arXiv Detail & Related papers (2025-01-07T10:18:37Z) - Function-Space Regularization in Neural Networks: A Probabilistic
Perspective [51.133793272222874]
We show that we can derive a well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training.
We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection and highly-calibrated predictive uncertainty estimates.
arXiv Detail & Related papers (2023-12-28T17:50:56Z) - On the Impact of Overparameterization on the Training of a Shallow
Neural Network in High Dimensions [0.0]
We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost.
In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk.
arXiv Detail & Related papers (2023-11-07T08:20:31Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - A deep learning based surrogate model for stochastic simulators [0.0]
We propose a deep learning-based surrogate model for simulators.
We utilize conditional maximum mean discrepancy (CMMD) as the loss-function.
Results obtained indicate the excellent performance of the proposed approach.
arXiv Detail & Related papers (2021-10-24T11:38:47Z) - Last Layer Marginal Likelihood for Invariance Learning [12.00078928875924]
We introduce a new lower bound to the marginal likelihood, which allows us to perform inference for a larger class of likelihood functions.
We work towards bringing this approach to neural networks by using an architecture with a Gaussian process in the last layer.
arXiv Detail & Related papers (2021-06-14T15:40:51Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z) - Iterative Surrogate Model Optimization (ISMO): An active learning
algorithm for PDE constrained optimization with deep neural networks [14.380314061763508]
We present a novel active learning algorithm, termed as iterative surrogate model optimization (ISMO)
This algorithm is based on deep neural networks and its key feature is the iterative selection of training data through a feedback loop between deep neural networks and any underlying standard optimization algorithm.
arXiv Detail & Related papers (2020-08-13T07:31:07Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.