Related papers: Should We Learn Most Likely Functions or Parameters?

Should We Learn Most Likely Functions or Parameters?

URL: http://arxiv.org/abs/2311.15990v1
Date: Mon, 27 Nov 2023 16:39:55 GMT
Title: Should We Learn Most Likely Functions or Parameters?
Authors: Shikai Qiu, Tim G. J. Rudner, Sanyam Kapoor, Andrew Gordon Wilson
Abstract summary: We investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We find that function-space MAP estimation can lead to flatter minima, better generalization, and improved to overfitting.
Score: 51.133793272222874
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting.

Related papers

Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
PriorCVAE: scalable MCMC parameter inference with Bayesian deep generative modelling [12.820453440015553]
Recent have shown that GP priors can be encoded using deep generative models such as variational autoencoders (VAEs) We show how VAEs can serve as drop-in replacements for the original priors during MCMC inference. We propose PriorCVAE to encode solutions of ODEs.
arXiv Detail & Related papers (2023-04-09T20:23:26Z)
On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z)
Sparse Horseshoe Estimation via Expectation-Maximisation [2.1485350418225244]
We propose a novel expectation-maximisation (EM) procedure for computing the MAP estimates of the parameters. A particular strength of our approach is that the M-step depends only on the form of the prior and it is independent of the form of the likelihood. In experiments performed on simulated and real data, our approach performs comparable, or superior to, state-of-the-art sparse estimation methods.
arXiv Detail & Related papers (2022-11-07T00:43:26Z)
Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications [71.23286211775084]
We introduce robust Gaussian process uniform error bounds in settings with unknown hyper parameters. Our approach computes a confidence region in the space of hyper parameters, which enables us to obtain a probabilistic upper bound for the model error. Experiments show that the bound performs significantly better than vanilla and fully Bayesian processes.
arXiv Detail & Related papers (2021-09-06T17:10:01Z)
A new method for parameter estimation in probabilistic models: Minimum probability flow [26.25482738732648]
We propose a new parameter fitting method, Minimum Probability Flow (MPF), which is applicable to any parametric model. We demonstrate parameter estimation using MPF in two cases: a continuous state space model, and an Ising spin glass.
arXiv Detail & Related papers (2020-07-17T21:19:44Z)
Misspecification-robust likelihood-free inference in high dimensions [13.934999364767918]
We introduce an extension of the popular Bayesian optimisation based approach to approximate discrepancy functions in a probabilistic manner. Our approach achieves computational scalability for higher dimensional parameter spaces by using separate acquisition functions and discrepancies for each parameter. The method successfully performs computationally efficient inference in a 100-dimensional space on canonical examples and compares favourably to existing modularised ABC methods.
arXiv Detail & Related papers (2020-02-21T16:06:11Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.