Related papers: Guiding Neural Network Initialization via Marginal Likelihood Maximization

Guiding Neural Network Initialization via Marginal Likelihood Maximization

URL: http://arxiv.org/abs/2012.09943v1
Date: Thu, 17 Dec 2020 21:46:09 GMT
Title: Guiding Neural Network Initialization via Marginal Likelihood Maximization
Authors: Anthony S. Tai, Chunfeng Huang
Abstract summary: We leverage the relationship between neural network and Gaussian process models having corresponding activation and covariance functions to infer the hyper- parameter values. Our experiment shows that marginal consistency provides recommendations that yield near-optimal prediction performance on MNIST classification task.
Score: 0.9137554315375919
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a simple, data-driven approach to help guide hyperparameter selection for neural network initialization. We leverage the relationship between neural network and Gaussian process models having corresponding activation and covariance functions to infer the hyperparameter values desirable for model initialization. Our experiment shows that marginal likelihood maximization provides recommendations that yield near-optimal prediction performance on MNIST classification task under experiment constraints. Furthermore, our empirical results indicate consistency in the proposed technique, suggesting that computation cost for the procedure could be significantly reduced with smaller training sets.

Related papers

Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach [5.33725915743382]
Federated learning (FL) has emerged as a powerful paradigm for collaborative model training across distributed clients.<n>Existing FL algorithms predominantly focus on unconstrained optimization problems with exact gradient information.
arXiv Detail & Related papers (2025-07-30T17:24:27Z)
Imitation Learning of MPC with Neural Networks: Error Guarantees and Sparsification [5.260346080244568]
We present a framework for bounding the approximation error in imitation model predictive controllers utilizing neural networks. We discuss how this method can be used to design a stable neural network controller with performance guarantees.
arXiv Detail & Related papers (2025-01-07T10:18:37Z)
Function-Space Regularization in Neural Networks: A Probabilistic Perspective [51.133793272222874]
We show that we can derive a well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training. We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection and highly-calibrated predictive uncertainty estimates.
arXiv Detail & Related papers (2023-12-28T17:50:56Z)
On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions [0.0]
We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk.
arXiv Detail & Related papers (2023-11-07T08:20:31Z)
Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL) We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking. We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
A deep learning based surrogate model for stochastic simulators [0.0]
We propose a deep learning-based surrogate model for simulators. We utilize conditional maximum mean discrepancy (CMMD) as the loss-function. Results obtained indicate the excellent performance of the proposed approach.
arXiv Detail & Related papers (2021-10-24T11:38:47Z)
Last Layer Marginal Likelihood for Invariance Learning [12.00078928875924]
We introduce a new lower bound to the marginal likelihood, which allows us to perform inference for a larger class of likelihood functions. We work towards bringing this approach to neural networks by using an architecture with a Gaussian process in the last layer.
arXiv Detail & Related papers (2021-06-14T15:40:51Z)
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points. This problem setting emerges in many domains where function evaluation is a complex and expensive process. We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z)
Iterative Surrogate Model Optimization (ISMO): An active learning algorithm for PDE constrained optimization with deep neural networks [14.380314061763508]
We present a novel active learning algorithm, termed as iterative surrogate model optimization (ISMO) This algorithm is based on deep neural networks and its key feature is the iterative selection of training data through a feedback loop between deep neural networks and any underlying standard optimization algorithm.
arXiv Detail & Related papers (2020-08-13T07:31:07Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.