Bayesian Model Selection, the Marginal Likelihood, and Generalization
- URL: http://arxiv.org/abs/2202.11678v3
- Date: Tue, 2 May 2023 01:27:39 GMT
- Title: Bayesian Model Selection, the Marginal Likelihood, and Generalization
- Authors: Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew
Gordon Wilson
- Abstract summary: We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing.
We show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search.
We also re-examine the connection between the marginal likelihood and PAC-Bayes bounds and use this connection to further elucidate the shortcomings of the marginal likelihood for model selection.
- Score: 49.19092837058752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How do we compare between hypotheses that are entirely consistent with
observations? The marginal likelihood (aka Bayesian evidence), which represents
the probability of generating our observations from a prior, provides a
distinctive approach to this foundational question, automatically encoding
Occam's razor. Although it has been observed that the marginal likelihood can
overfit and is sensitive to prior assumptions, its limitations for
hyperparameter learning and discrete model comparison have not been thoroughly
investigated. We first revisit the appealing properties of the marginal
likelihood for learning constraints and hypothesis testing. We then highlight
the conceptual and practical issues in using the marginal likelihood as a proxy
for generalization. Namely, we show how marginal likelihood can be negatively
correlated with generalization, with implications for neural architecture
search, and can lead to both underfitting and overfitting in hyperparameter
learning. We also re-examine the connection between the marginal likelihood and
PAC-Bayes bounds and use this connection to further elucidate the shortcomings
of the marginal likelihood for model selection. We provide a partial remedy
through a conditional marginal likelihood, which we show is more aligned with
generalization, and practically valuable for large-scale hyperparameter
learning, such as in deep kernel learning.
Related papers
- Hypothesis Testing for Class-Conditional Noise Using Local Maximum
Likelihood [1.8798171797988192]
In supervised learning, automatically assessing the quality of the labels before any learning takes place remains an open research question.
In this paper we show how similar procedures can be followed when the underlying model is a product of Local Maximum Likelihood Estimation.
This different view allows for wider applicability of the tests by offering users access to a richer model class.
arXiv Detail & Related papers (2023-12-15T22:14:58Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference [9.940560505044122]
We propose a method to improve the efficiency and accuracy of amortized Bayesian inference.
We estimate the marginal likelihood based on approximate representations of the joint model.
arXiv Detail & Related papers (2023-10-06T17:41:41Z) - Monotonicity and Double Descent in Uncertainty Estimation with Gaussian
Processes [52.92110730286403]
It is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions.
We prove that by tuning hyper parameters, the performance, as measured by the marginal likelihood, improves monotonically with the input dimension.
We also prove that cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent.
arXiv Detail & Related papers (2022-10-14T08:09:33Z) - The Causal Marginal Polytope for Bounding Treatment Effects [9.196779204457059]
We propose a novel way to identify causal effects without constructing a global causal model.
We enforce compatibility between marginals of a causal model and data, without constructing a global causal model.
We call this collection of locally consistent marginals the causal marginal polytope.
arXiv Detail & Related papers (2022-02-28T15:08:22Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.