Related papers: Mixture weights optimisation for Alpha-Divergence Variational Inference

Mixture weights optimisation for Alpha-Divergence Variational Inference

URL: http://arxiv.org/abs/2106.05114v1
Date: Wed, 9 Jun 2021 14:47:05 GMT
Title: Mixture weights optimisation for Alpha-Divergence Variational Inference
Authors: Kam\'elia Daudel and Randal Douc
Abstract summary: This paper focuses on $alpha$-divergence minimisation methods for Variational Inference. Power Descent, defined for all $alpha neq 1$, is one such algorithm. First-order approximations allow us to introduce the Renyi Descent, a novel algorithm.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper focuses on $\alpha$-divergence minimisation methods for Variational Inference. More precisely, we are interested in algorithms optimising the mixture weights of any given mixture model, without any information on the underlying distribution of its mixture components parameters. The Power Descent, defined for all $\alpha \neq 1$, is one such algorithm and we establish in our work the full proof of its convergence towards the optimal mixture weights when $\alpha <1$. Since the $\alpha$-divergence recovers the widely-used forward Kullback-Leibler when $\alpha \to 1$, we then extend the Power Descent to the case $\alpha = 1$ and show that we obtain an Entropic Mirror Descent. This leads us to investigate the link between Power Descent and Entropic Mirror Descent: first-order approximations allow us to introduce the Renyi Descent, a novel algorithm for which we prove an $O(1/N)$ convergence rate. Lastly, we compare numerically the behavior of the unbiased Power Descent and of the biased Renyi Descent and we discuss the potential advantages of one algorithm over the other.

Related papers

Entangled Mean Estimation in High-Dimensions [36.97113089188035]
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model. We show that the optimal error (up to polylogarithmic factors) is $f(alpha,N) + sqrtD/(alpha N)$, where the term $f(alpha,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate.
arXiv Detail & Related papers (2025-01-09T18:31:35Z)
Learning general Gaussian mixtures with efficient score matching [16.06356123715737]
We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions. We make no separation assumptions on the underlying mixture components. We give an algorithm that draws $dmathrmpoly(k/varepsilon)$ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler.
arXiv Detail & Related papers (2024-04-29T17:30:36Z)
Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise [45.60098988395789]
We propose a new algorithm for constructing UCB-type algorithms for multi-armed bandits. We show that in the case of symmetric noise in the reward, we can achieve an $O(log TsqrtKTlog T)$ regret bound instead of $Oleft.
arXiv Detail & Related papers (2024-02-10T22:38:21Z)
Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions. In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z)
Neural Inference of Gaussian Processes for Time Series Data of Quasars [72.79083473275742]
We introduce a new model that enables it to describe quasar spectra completely. We also introduce a new method of inference of Gaussian process parameters, which we call $textitNeural Inference$. The combination of both the CDRW model and Neural Inference significantly outperforms the baseline DRW and MLE.
arXiv Detail & Related papers (2022-11-17T13:01:26Z)
Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods [57.050204432302195]
This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(sigma / sqrtT)$ convergence when the oracle feedback is with variance $sigma2$, and improves its convergence to $O( 1 / T3)$ with deterministic oracles.
arXiv Detail & Related papers (2022-11-03T14:12:51Z)
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems. We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting. To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z)
Forward Looking Best-Response Multiplicative Weights Update Methods [14.519198115835966]
A novel variant of the emphmultiplicative weights update method guarantees last-iterate convergence for emphzero-sum games with a unique emphNash equilibrium Our findings reveal that our algorithm offers substantial gains both in terms of the convergence rate and the region of contraction relative to the previous approach.
arXiv Detail & Related papers (2021-06-07T13:03:33Z)
Privacy amplification and decoupling without smoothing [0.0]
We prove an achievability result for privacy amplification and decoupling in terms of the sandwiched R'enyi entropy of order $alpha in (1,2$) The fact that this proof works for $alpha$ close to 1 means that we can bypass the smooth min-entropy in the many applications where the bound comes from the fully quantum AEP or entropy accumulation.
arXiv Detail & Related papers (2021-05-11T21:01:46Z)
Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which. machines can independently compute gradients, and cooperate. Our algorithm is based on a new concentration technique, and its sample complexity. It is very practical: it improves upon the performance of all prior methods when no. setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z)
Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics. In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z)
Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise. We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting. We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$. Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging. We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.