Related papers: Theoretical Error Analysis of Entropy Approximation for Gaussian Mixtures

Theoretical Error Analysis of Entropy Approximation for Gaussian Mixtures

URL: http://arxiv.org/abs/2202.13059v6
Date: Wed, 22 Jan 2025 13:42:12 GMT
Title: Theoretical Error Analysis of Entropy Approximation for Gaussian Mixtures
Authors: Takashi Furuya, Hiroyuki Kusumoto, Koichi Taniguchi, Naoya Kanno, Kazuma Suetake,
Abstract summary: In this paper, we study the approximate entropy represented as the sum of the entropies of unimodal Gaussian distributions with mixing coefficients.<n>We theoretically analyze the approximation error between the true and the approximate entropy to reveal when this approximation works effectively.<n>Our results provide a guarantee that this approximation works well for high-dimensional problems, such as neural networks.
Score: 0.6990493129893112
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gaussian mixture distributions are commonly employed to represent general probability distributions. Despite the importance of using Gaussian mixtures for uncertainty estimation, the entropy of a Gaussian mixture cannot be calculated analytically. In this paper, we study the approximate entropy represented as the sum of the entropies of unimodal Gaussian distributions with mixing coefficients. This approximation is easy to calculate analytically regardless of dimension, but there is a lack of theoretical guarantees. We theoretically analyze the approximation error between the true and the approximate entropy to reveal when this approximation works effectively. This error is essentially controlled by how far apart each Gaussian component of the Gaussian mixture is. To measure such separation, we introduce the ratios of the distances between the means to the sum of the variances of each Gaussian component of the Gaussian mixture, and we reveal that the error converges to zero as the ratios tend to infinity. In addition, the probabilistic estimate indicates that this convergence situation is more likely to occur in higher-dimensional spaces. Therefore, our results provide a guarantee that this approximation works well for high-dimensional problems, such as neural networks that involve a large number of parameters.

Related papers

Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians [27.20127082606962]
Variational inference (VI) is a popular approach in Bayesian inference. This work aims to contribute to the theoretical study of VI in the non-Gaussian case.
arXiv Detail & Related papers (2024-06-06T12:38:59Z)
On the best approximation by finite Gaussian mixtures [7.084611118322622]
We consider the problem of approximating a general Gaussian location mixture by finite mixtures. The minimum order of finite mixtures that achieve a prescribed accuracy is determined within constant factors.
arXiv Detail & Related papers (2024-04-13T06:57:44Z)
Non-asymptotic approximations for Pearson's chi-square statistic and its application to confidence intervals for strictly convex functions of the probability weights of discrete distributions [0.0]
We develop a non-asymptotic local normal approximation for multinomial probabilities. We apply our results to find confidence intervals for the negative entropy of discrete distributions.
arXiv Detail & Related papers (2023-09-05T01:18:48Z)
The Parametric Stability of Well-separated Spherical Gaussian Mixtures [7.238973585403367]
We quantify the parameter stability of a spherical Gaussian Mixture Model (sGMM) under small perturbations in distribution space. We derive the first explicit bound to show that for a mixture of spherical Gaussian $P$ (sGMM) in a pre-defined model class, all other sGMM close to $P in this model class in total variation distance has a small parameter distance to $P.
arXiv Detail & Related papers (2023-02-01T04:52:13Z)
Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated. We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z)
Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset. Many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z)
Joint Probability Estimation Using Tensor Decomposition and Dictionaries [3.4720326275851994]
We study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals. We create a dictionary of various families of distributions by inspecting the data, and use it to approximate each decomposed factor of the product in the mixture.
arXiv Detail & Related papers (2022-03-03T11:55:51Z)
Robust Estimation for Nonparametric Families via Generative Adversarial Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems. Our work extend these to robust mean estimation, second moment estimation, and robust linear regression. In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z)
A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data. A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z)
Clustering a Mixture of Gaussians with Unknown Covariance [4.821312633849745]
We derive a Max-Cut integer program based on maximum likelihood estimation. We develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. We generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights.
arXiv Detail & Related papers (2021-10-04T17:59:20Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance. Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z)
Spectral clustering under degree heterogeneity: a case for the random walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z)
Maximum Multiscale Entropy and Neural Network Regularization [28.00914218615924]
A well-known result shows that the maximum entropy distribution under a mean constraint has an exponential form called the Gibbs-Boltzmann distribution. This paper investigates a generalization of these results to a multiscale setting.
arXiv Detail & Related papers (2020-06-25T17:56:11Z)
Uniform Convergence Rates for Maximum Likelihood Estimation under Two-Component Gaussian Mixture Models [13.769786711365104]
We derive uniform convergence rates for the maximum likelihood estimator and minimax lower bounds for parameter estimation. We assume the mixing proportions of the mixture are known and fixed, but make no separation assumption on the underlying mixture components.
arXiv Detail & Related papers (2020-06-01T04:13:48Z)
Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains. One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.