Riemannian optimization for non-centered mixture of scaled Gaussian
distributions
- URL: http://arxiv.org/abs/2209.03315v2
- Date: Sun, 25 Jun 2023 15:21:03 GMT
- Title: Riemannian optimization for non-centered mixture of scaled Gaussian
distributions
- Authors: Antoine Collas, Arnaud Breloy, Chengfang Ren, Guillaume Ginolhac,
Jean-Philippe Ovarlez
- Abstract summary: This paper studies the statistical model of the non-centered mixture of scaled Gaussian distributions (NC-MSG)
Using the Fisher-Rao information geometry associated to this distribution, we derive a Riemannian gradient descent algorithm.
A Nearest centroid classifier is implemented leveraging the KL divergence and its associated center of mass.
- Score: 17.855338784378
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper studies the statistical model of the non-centered mixture of
scaled Gaussian distributions (NC-MSG). Using the Fisher-Rao information
geometry associated to this distribution, we derive a Riemannian gradient
descent algorithm. This algorithm is leveraged for two minimization problems.
The first one is the minimization of a regularized negative log-likelihood
(NLL). The latter makes the trade-off between a white Gaussian distribution and
the NC-MSG. Conditions on the regularization are given so that the existence of
a minimum to this problem is guaranteed without assumptions on the samples.
Then, the Kullback-Leibler (KL) divergence between two NC-MSG is derived. This
divergence enables us to define a minimization problem to compute centers of
mass of several NC-MSGs. The proposed Riemannian gradient descent algorithm is
leveraged to solve this second minimization problem. Numerical experiments show
the good performance and the speed of the Riemannian gradient descent on the
two problems. Finally, a Nearest centroid classifier is implemented leveraging
the KL divergence and its associated center of mass. Applied on the large scale
dataset Breizhcrops, this classifier shows good accuracies as well as
robustness to rigid transformations of the test set.
Related papers
- Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians [27.20127082606962]
Variational inference (VI) is a popular approach in Bayesian inference.
This work aims to contribute to the theoretical study of VI in the non-Gaussian case.
arXiv Detail & Related papers (2024-06-06T12:38:59Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Accelerated Bayesian imaging by relaxed proximal-point Langevin sampling [4.848683707039751]
This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems.
For models that are smooth or regularised by Moreau-Yosida smoothing, the midpoint is equivalent to an implicit discretisation of an overdamped Langevin diffusion.
For targets that are $kappa$-strongly log-concave, the provided non-asymptotic convergence analysis also identifies the optimal time step.
arXiv Detail & Related papers (2023-08-18T10:55:49Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Generalizing Gaussian Smoothing for Random Search [23.381986209234164]
Gaussian smoothing (GS) is a derivative-free optimization algorithm that estimates the gradient of an objective using perturbations of the current benchmarks.
We propose to choose a distribution for perturbations that minimizes the error of such distributions with provably smaller MSE.
arXiv Detail & Related papers (2022-11-27T04:42:05Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Distributed Sparse Regression via Penalization [5.990069843501885]
We study linear regression over a network of agents, modeled as an undirected graph (with no centralized node)
The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint.
We show that the proximal-gradient algorithm applied to the penalized problem converges linearly up to a tolerance of the order of the centralized statistical error.
arXiv Detail & Related papers (2021-11-12T01:51:50Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient
Flow [26.725412498545385]
We show that a parametric kernelized gradient flow mimics the min-max game in gradient regularized $mathrmMMD$ GAN.
We then derive an explicit condition which ensures that gradient descent on the space of the generator in regularized $mathrmMMD$ GAN is globally convergent to the target distribution.
arXiv Detail & Related papers (2020-11-04T16:55:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.