Predictive Complexity Priors
- URL: http://arxiv.org/abs/2006.10801v3
- Date: Wed, 21 Oct 2020 15:56:55 GMT
- Title: Predictive Complexity Priors
- Authors: Eric Nalisnick, Jonathan Gordon, Jos\'e Miguel Hern\'andez-Lobato
- Abstract summary: We propose a functional prior that is defined by comparing the model's predictions to those of a reference model.
Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables.
We apply our predictive complexity prior to high-dimensional regression, reasoning over neural network depth, and sharing of statistical strength for few-shot learning.
- Score: 3.5547661483076998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Specifying a Bayesian prior is notoriously difficult for complex models such
as neural networks. Reasoning about parameters is made challenging by the
high-dimensionality and over-parameterization of the space. Priors that seem
benign and uninformative can have unintuitive and detrimental effects on a
model's predictions. For this reason, we propose predictive complexity priors:
a functional prior that is defined by comparing the model's predictions to
those of a reference model. Although originally defined on the model outputs,
we transfer the prior to the model parameters via a change of variables. The
traditional Bayesian workflow can then proceed as usual. We apply our
predictive complexity prior to high-dimensional regression, reasoning over
neural network depth, and sharing of statistical strength for few-shot
learning.
Related papers
- Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning [6.278498348219108]
We revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom.
We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments.
arXiv Detail & Related papers (2024-10-02T06:09:57Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Function-Space Regularization for Deep Bayesian Classification [33.63495888167032]
We apply a Dirichlet prior in predictive space and perform approximate function-space variational inference.
By adapting the inference, the same function-space prior can be combined with different models without affecting model architecture or size.
arXiv Detail & Related papers (2023-07-12T10:17:54Z) - Bayesian Neural Network Inference via Implicit Models and the Posterior
Predictive Distribution [0.8122270502556371]
We propose a novel approach to perform approximate Bayesian inference in complex models such as Bayesian neural networks.
The approach is more scalable to large data than Markov Chain Monte Carlo.
We see this being useful in applications such as surrogate and physics-based models.
arXiv Detail & Related papers (2022-09-06T02:43:19Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Provable Benefits of Overparameterization in Model Compression: From
Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models.
This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning.
We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z) - Deep transformation models: Tackling complex regression problems with
neural network based transformation models [0.0]
We present a deep transformation model for probabilistic regression.
It estimates the whole conditional probability distribution, which is the most thorough way to capture uncertainty about the outcome.
Our method works for complex input data, which we demonstrate by employing a CNN architecture on image data.
arXiv Detail & Related papers (2020-04-01T14:23:12Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.