Related papers: An Asymptotic Equation Linking WAIC and WBIC in Singular Models

An Asymptotic Equation Linking WAIC and WBIC in Singular Models

URL: http://arxiv.org/abs/2505.13902v2
Date: Wed, 21 May 2025 04:10:20 GMT
Title: An Asymptotic Equation Linking WAIC and WBIC in Singular Models
Authors: Naoki Hayashi, Takuro Kutsuna, Sawa Takamuku,
Abstract summary: In statistical learning, models are classified as regular or singular depending on whether the mapping from parameters to probability distributions is injective.<n>Most models with hierarchical structures or latent variables are singular, for which conventional criteria are inapplicable due to the breakdown of normal approximations for the likelihood and posterior.<n>This theoretical contribution provides a foundation for future developments in the computational efficiency of model selection in singular models.
Score: 2.385046494466299
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In statistical learning, models are classified as regular or singular depending on whether the mapping from parameters to probability distributions is injective. Most models with hierarchical structures or latent variables are singular, for which conventional criteria such as the Akaike Information Criterion and the Bayesian Information Criterion are inapplicable due to the breakdown of normal approximations for the likelihood and posterior. To address this, the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC) have been proposed. Since WAIC and WBIC are computed using posterior distributions at different temperature settings, separate posterior sampling is generally required. In this paper, we theoretically derive an asymptotic equation that links WAIC and WBIC, despite their dependence on different posteriors. This equation yields an asymptotically unbiased expression of WAIC in terms of the posterior distribution used for WBIC. The result clarifies the structural relationship between these criteria within the framework of singular learning theory, and deepens understanding of their asymptotic behavior. This theoretical contribution provides a foundation for future developments in the computational efficiency of model selection in singular models.

Related papers

Generalized Criterion for Identifiability of Additive Noise Models Using Majorization [7.448620208767376]
We introduce a novel identifiability criterion for directed acyclic graph (DAG) models. We demonstrate that this criterion extends and generalizes existing identifiability criteria. We present a new algorithm for learning a topological ordering of variables.
arXiv Detail & Related papers (2024-04-08T02:18:57Z)
Statistical inference for pairwise comparison models [5.487882744996216]
This paper establishes a near-optimal normality for the maximum likelihood in a broad class of pairwise comparison models. The key idea lies in identifying the Fisher information matrix as a weighted graph Laplacian, which can be studied via a meticulous spectral analysis.
arXiv Detail & Related papers (2024-01-16T16:14:09Z)
Bayesian Model Selection via Mean-Field Variational Approximation [10.433170683584994]
We study the non-asymptotic properties of mean-field (MF) inference under the Bayesian framework. We show a Bernstein von-Mises (BvM) theorem for the variational distribution from MF under possible model mis-specification.
arXiv Detail & Related papers (2023-12-17T04:48:25Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Bayesian Renormalization [68.8204255655161]
We present a fully information theoretic approach to renormalization inspired by Bayesian statistical inference. The main insight of Bayesian Renormalization is that the Fisher metric defines a correlation length that plays the role of an emergent RG scale. We provide insight into how the Bayesian Renormalization scheme relates to existing methods for data compression and data generation.
arXiv Detail & Related papers (2023-05-17T18:00:28Z)
On the Foundations of Cycles in Bayesian Networks [4.312746668772342]
We present a foundational study regarding semantics for cyclic BNs that are generic and conservatively extend the cycle-free setting. First, we propose constraint-based semantics that specify requirements for full joint distributions over a BN to be consistent with the local conditional probabilities and independencies. Second, two kinds of limit semantics that formalize infinite unfolding approaches are introduced and shown to be computable by a Markov chain construction.
arXiv Detail & Related papers (2023-01-20T14:40:17Z)
Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region. Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z)
Optimal regularizations for data generation with probabilistic graphical models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models. We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z)
Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores) For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
GANs with Conditional Independence Graphs: On Subadditivity of Probability Divergences [70.30467057209405]
Generative Adversarial Networks (GANs) are modern methods to learn the underlying distribution of a data set. GANs are designed in a model-free fashion where no additional information about the underlying distribution is available. We propose a principled design of a model-based GAN that uses a set of simple discriminators on the neighborhoods of the Bayes-net/MRF.
arXiv Detail & Related papers (2020-03-02T04:31:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.