Provably Scalable Black-Box Variational Inference with Structured Variational Families
- URL: http://arxiv.org/abs/2401.10989v3
- Date: Sun, 01 Dec 2024 00:44:02 GMT
- Title: Provably Scalable Black-Box Variational Inference with Structured Variational Families
- Authors: Joohwan Ko, Kyurae Kim, Woo Chang Kim, Jacob R. Gardner,
- Abstract summary: We show that certain scale matrix structures can achieve a better iteration complexity of $mathcalOleft(Nright)$, implying better scaling with respect to $N$.<n>We empirically verify our theoretical results on large-scale hierarchical models.
- Score: 8.344690980938307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational families with full-rank covariance approximations are known not to work well in black-box variational inference (BBVI), both empirically and theoretically. In fact, recent computational complexity results for BBVI have established that full-rank variational families scale poorly with the dimensionality of the problem compared to e.g. mean-field families. This is particularly critical to hierarchical Bayesian models with local variables; their dimensionality increases with the size of the datasets. Consequently, one gets an iteration complexity with an explicit $\mathcal{O}(N^2)$ dependence on the dataset size $N$. In this paper, we explore a theoretical middle ground between mean-field variational families and full-rank families: structured variational families. We rigorously prove that certain scale matrix structures can achieve a better iteration complexity of $\mathcal{O}\left(N\right)$, implying better scaling with respect to $N$. We empirically verify our theoretical results on large-scale hierarchical models.
Related papers
- Complexity of Markov Chain Monte Carlo for Generalized Linear Models [1.4466802614938334]
We show that for $ngtrsim d$, MCMC attains the same complexity scaling in $n$, $d$ as first-order optimization algorithms, up to sub-polynomial factors.<n>Our complexities apply to appropriately scaled priors that are not necessarily Gaussian-tailed, including Student-$t$ and flat priors, with log-posteriors that are not necessarily globally concave or gradient-Lipschitz.
arXiv Detail & Related papers (2025-12-14T16:04:27Z) - Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference [22.339038971795873]
Black-box variational inference converges at almost dimension-independent rate.<n>We prove that our bound on the gradient variance cannot be improved using only spectral bounds on the Hessian of the target log-density.
arXiv Detail & Related papers (2025-05-27T20:08:28Z) - Identifying General Mechanism Shifts in Linear Causal Representations [58.6238439611389]
We consider the linear causal representation learning setting where we observe a linear mixing of $d$ unknown latent factors.
Recent work has shown that it is possible to recover the latent factors as well as the underlying structural causal model over them.
We provide a surprising identifiability result that it is indeed possible, under some very mild standard assumptions, to identify the set of shifted nodes.
arXiv Detail & Related papers (2024-10-31T15:56:50Z) - Complexity-Theoretic Implications of Multicalibration [8.315801422499863]
Multiaccurate predictors satisfy a stronger condition: they are calibrated on each set in the collection.
This complexity-theoretic Regularity Lemma is known to have implications in different areas.
We show that every function (regardless of its hardness) has a small collection of disjoint hardcore sets.
arXiv Detail & Related papers (2023-12-28T18:53:21Z) - Understanding Deep Learning via Decision Boundary [81.49114762506287]
We show that the neural network with lower decision boundary (DB) variability has better generalizability.
Two new notions, algorithm DB variability and $(epsilon, eta)$-data DB variability, are proposed to measure the decision boundary variability.
arXiv Detail & Related papers (2022-06-03T11:34:12Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Linear Discriminant Analysis with High-dimensional Mixed Variables [10.774094462083843]
This paper develops a novel approach for classifying high-dimensional observations with mixed variables.
We overcome the challenge of having to split data into exponentially many cells.
Results on the estimation accuracy and the misclassification rates are established.
arXiv Detail & Related papers (2021-12-14T03:57:56Z) - Amortized Variational Inference for Simple Hierarchical Models [37.56550107432323]
It is difficult to use subsampling with variational inference in hierarchical models since the number of local latent variables scales with the dataset.
This paper suggests an amortized approach where shared parameters simultaneously represent all local distributions.
It is also dramatically faster than using a structured variational distribution.
arXiv Detail & Related papers (2021-11-04T20:29:12Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Variational Causal Networks: Approximate Bayesian Inference over Causal
Structures [132.74509389517203]
We introduce a parametric variational family modelled by an autoregressive distribution over the space of discrete DAGs.
In experiments, we demonstrate that the proposed variational posterior is able to provide a good approximation of the true posterior.
arXiv Detail & Related papers (2021-06-14T17:52:49Z) - Spatially relaxed inference on high-dimensional linear models [48.989769153211995]
We study the properties of ensembled clustered inference algorithms which combine spatially constrained clustering, statistical inference, and ensembling to aggregate several clustered inference solutions.
We show that ensembled clustered inference algorithms control the $delta$-FWER under standard assumptions for $delta$ equal to the largest cluster diameter.
arXiv Detail & Related papers (2021-06-04T16:37:19Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Relative stability toward diffeomorphisms in deep nets indicates
performance [66.51503682738931]
We show that stability toward diffeomorphisms does not strongly correlate to performance on benchmark data sets of images.
We find that the stability toward diffeomorphisms relative to that of generic transformations $R_f$ correlates remarkably with the test error $epsilon_t$.
For CIFAR10 and 15 known architectures, we find $epsilon_tapprox 0.2sqrtR_f$, suggesting that obtaining a small $R_f$ is important to achieve good performance.
arXiv Detail & Related papers (2021-05-06T07:03:30Z) - Generative Learning of Heterogeneous Tail Dependence [13.60514494665717]
Our model features heterogeneous and asymmetric tail dependence between all pairs of individual dimensions.
We devise a novel moment learning algorithm to learn the parameters.
Results show that this framework gives better finite-sample performance compared to the copula-based benchmarks.
arXiv Detail & Related papers (2020-11-26T05:34:31Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z) - Isotonic regression with unknown permutations: Statistics, computation,
and adaptation [13.96377843988598]
We study the minimax risk of estimation (in empirical $L$ loss) and the fundamental limits of adaptation (quantified by the adaptivity index)
We provide a Mirsky partition estimator that is minimax optimal while also achieving the smallest adaptivity index possible for vanilla time procedures.
In a complementary direction, we show that natural modifications of existing estimators fail to satisfy at least one of the desiderata optimal worst-case statistical performance, computational efficiency, and fast adaptation.
arXiv Detail & Related papers (2020-09-05T22:17:51Z) - FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems.
We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z) - Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight
Posterior Approximations [40.384018112884874]
We prove that deep mean-field variational weight posteriors can induce similar distributions in function-space to those induced by shallower networks.
Our results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.
arXiv Detail & Related papers (2020-02-10T13:11:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.