Variational Inference in Location-Scale Families: Exact Recovery of the Mean and Correlation Matrix
- URL: http://arxiv.org/abs/2410.11067v1
- Date: Mon, 14 Oct 2024 20:25:49 GMT
- Title: Variational Inference in Location-Scale Families: Exact Recovery of the Mean and Correlation Matrix
- Authors: Charles C. Margossian, Lawrence K. Saul,
- Abstract summary: Given an intractable target density $p$, variational inference (VI) attempts to find the best approximation $q$ from a tractable family $Q$.
In practice, $Q$ is not rich enough to contain $p$, and the approximation is misspecified even when it is a unique global minimizer of $textKL(q||p)$.
We prove strong guarantees for VI not only under mild regularity conditions but also in the face of severe misspecifications.
- Score: 5.009155983220515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given an intractable target density $p$, variational inference (VI) attempts to find the best approximation $q$ from a tractable family $Q$. This is typically done by minimizing the exclusive Kullback-Leibler divergence, $\text{KL}(q||p)$. In practice, $Q$ is not rich enough to contain $p$, and the approximation is misspecified even when it is a unique global minimizer of $\text{KL}(q||p)$. In this paper, we analyze the robustness of VI to these misspecifications when $p$ exhibits certain symmetries and $Q$ is a location-scale family that shares these symmetries. We prove strong guarantees for VI not only under mild regularity conditions but also in the face of severe misspecifications. Namely, we show that (i) VI recovers the mean of $p$ when $p$ exhibits an \textit{even} symmetry, and (ii) it recovers the correlation matrix of $p$ when in addition~$p$ exhibits an \textit{elliptical} symmetry. These guarantees hold for the mean even when $q$ is factorized and $p$ is not, and for the correlation matrix even when~$q$ and~$p$ behave differently in their tails. We analyze various regimes of Bayesian inference where these symmetries are useful idealizations, and we also investigate experimentally how VI behaves in their absence.
Related papers
- Optimal level set estimation for non-parametric tournament and crowdsourcing problems [49.75262185577198]
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions.
In this paper, we assume that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns.
We construct an efficient-time algorithm that turns out to be minimax optimal for this classification problem.
arXiv Detail & Related papers (2024-08-27T18:28:31Z) - Asymptotic Normality of Generalized Low-Rank Matrix Sensing via Riemannian Geometry [37.53442095760427]
We prove an normality guarantee for generalized low-rank matrix sensing.<n>We parameterize the manifold of low-rank matrices by $barthetabarthetatop$.<n>We prove $sqrtn(phi0-phi*)xrightarrowDN(0,(H*)-1)$ as $ntoinfty$, where $phi0$ and $phi*$ are representations of $bartheta*$ and $barthe
arXiv Detail & Related papers (2024-07-14T15:11:13Z) - Joint Learning of Linear Dynamical Systems under Smoothness Constraints [5.2395896768723045]
We consider the problem of joint learning of multiple linear dynamical systems.
In particular, we show conditions under which the mean-squared error bounds on the mean-squared error (MSE) converges to zero as $m$ increases.
arXiv Detail & Related papers (2024-06-03T08:29:42Z) - How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing:
The Curses of Symmetry and Initialization [46.55524654398093]
We show how over- parameterization changes the convergence behaviors of descent.
The goal is to recover an unknown low-rank ground-rank ground-truth matrix from near-isotropic linear measurements.
We propose a novel method that only modifies one step of GD and obtains a convergence rate independent of $alpha$.
arXiv Detail & Related papers (2023-10-03T03:34:22Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Learning linear dynamical systems under convex constraints [4.4351901934764975]
We consider the problem of identification of linear dynamical systems from $T$ samples of a single trajectory.
$A*$ can be reliably estimated for values $T$ smaller than what is needed for unconstrained setting.
arXiv Detail & Related papers (2023-03-27T11:49:40Z) - Sparse Signal Detection in Heteroscedastic Gaussian Sequence Models:
Sharp Minimax Rates [1.0309387309011746]
We study the signal detection problem against sparse alternatives, for known sparsity $s$.
We find minimax upper and lower bounds over the minimax separation radius $epsilon*$ and prove that they are always matching.
Our results reveal new phase transitions regarding the behavior of $epsilon*$ with respect to the level of sparsity, to the $Lt$ metric, and to the heteroscedasticity profile of $Sigma$.
arXiv Detail & Related papers (2022-11-15T23:53:39Z) - Symmetry-resolved entanglement entropy in critical free-fermion chains [0.0]
symmetry-resolved R'enyi entanglement entropy is known to have rich theoretical connections to conformal field theory.
We consider a class of critical quantum chains with a microscopic U(1) symmetry.
For the density matrix, $rho_A$, of subsystems of $L$ neighbouring sites we calculate the leading terms in the large $L$ expansion of the symmetry-resolved R'enyi entanglement entropies.
arXiv Detail & Related papers (2022-02-23T19:00:03Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Simplest non-additive measures of quantum resources [77.34726150561087]
We study measures that can be described by $cal E(rhootimes N) =E(e;N) ne Ne$.
arXiv Detail & Related papers (2021-06-23T20:27:04Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.