Should We Learn Most Likely Functions or Parameters?
- URL: http://arxiv.org/abs/2311.15990v1
- Date: Mon, 27 Nov 2023 16:39:55 GMT
- Title: Should We Learn Most Likely Functions or Parameters?
- Authors: Shikai Qiu, Tim G. J. Rudner, Sanyam Kapoor, Andrew Gordon Wilson
- Abstract summary: We investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data.
We find that function-space MAP estimation can lead to flatter minima, better generalization, and improved to overfitting.
- Score: 51.133793272222874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard regularized training procedures correspond to maximizing a posterior
distribution over parameters, known as maximum a posteriori (MAP) estimation.
However, model parameters are of interest only insomuch as they combine with
the functional form of a model to provide a function that can make good
predictions. Moreover, the most likely parameters under the parameter posterior
do not generally correspond to the most likely function induced by the
parameter posterior. In fact, we can re-parametrize a model such that any
setting of parameters can maximize the parameter posterior. As an alternative,
we investigate the benefits and drawbacks of directly estimating the most
likely function implied by the model and the data. We show that this procedure
leads to pathological solutions when using neural networks and prove conditions
under which the procedure is well-behaved, as well as a scalable approximation.
Under these conditions, we find that function-space MAP estimation can lead to
flatter minima, better generalization, and improved robustness to overfitting.
Related papers
- Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - PriorCVAE: scalable MCMC parameter inference with Bayesian deep
generative modelling [12.820453440015553]
Recent have shown that GP priors can be encoded using deep generative models such as variational autoencoders (VAEs)
We show how VAEs can serve as drop-in replacements for the original priors during MCMC inference.
We propose PriorCVAE to encode solutions of ODEs.
arXiv Detail & Related papers (2023-04-09T20:23:26Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z) - Sparse Horseshoe Estimation via Expectation-Maximisation [2.1485350418225244]
We propose a novel expectation-maximisation (EM) procedure for computing the MAP estimates of the parameters.
A particular strength of our approach is that the M-step depends only on the form of the prior and it is independent of the form of the likelihood.
In experiments performed on simulated and real data, our approach performs comparable, or superior to, state-of-the-art sparse estimation methods.
arXiv Detail & Related papers (2022-11-07T00:43:26Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for
Safety-Critical Applications [71.23286211775084]
We introduce robust Gaussian process uniform error bounds in settings with unknown hyper parameters.
Our approach computes a confidence region in the space of hyper parameters, which enables us to obtain a probabilistic upper bound for the model error.
Experiments show that the bound performs significantly better than vanilla and fully Bayesian processes.
arXiv Detail & Related papers (2021-09-06T17:10:01Z) - A new method for parameter estimation in probabilistic models: Minimum
probability flow [26.25482738732648]
We propose a new parameter fitting method, Minimum Probability Flow (MPF), which is applicable to any parametric model.
We demonstrate parameter estimation using MPF in two cases: a continuous state space model, and an Ising spin glass.
arXiv Detail & Related papers (2020-07-17T21:19:44Z) - Misspecification-robust likelihood-free inference in high dimensions [13.934999364767918]
We introduce an extension of the popular Bayesian optimisation based approach to approximate discrepancy functions in a probabilistic manner.
Our approach achieves computational scalability for higher dimensional parameter spaces by using separate acquisition functions and discrepancies for each parameter.
The method successfully performs computationally efficient inference in a 100-dimensional space on canonical examples and compares favourably to existing modularised ABC methods.
arXiv Detail & Related papers (2020-02-21T16:06:11Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.