Sum-of-Squares Relaxations for Information Theory and Variational
Inference
- URL: http://arxiv.org/abs/2206.13285v3
- Date: Mon, 18 Sep 2023 08:42:56 GMT
- Title: Sum-of-Squares Relaxations for Information Theory and Variational
Inference
- Authors: Francis Bach (SIERRA)
- Abstract summary: We consider extensions of the Shannon relative entropy, referred to as $f$-divergences.
We derive a sequence of convex relaxations for computing these divergences.
We provide more efficient relaxations based on spectral information divergences from quantum information theory.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider extensions of the Shannon relative entropy, referred to as
$f$-divergences.Three classical related computational problems are typically
associated with these divergences: (a) estimation from moments, (b) computing
normalizing integrals, and (c) variational inference in probabilistic models.
These problems are related to one another through convex duality, and for all
them, there are many applications throughout data science, and we aim for
computationally tractable approximation algorithms that preserve properties of
the original problem such as potential convexity or monotonicity. In order to
achieve this, we derive a sequence of convex relaxations for computing these
divergences from non-centered covariance matrices associated with a given
feature vector: starting from the typically non-tractable optimal lower-bound,
we consider an additional relaxation based on ``sums-of-squares'', which is is
now computable in polynomial time as a semidefinite program. We also provide
computationally more efficient relaxations based on spectral information
divergences from quantum information theory. For all of the tasks above, beyond
proposing new relaxations, we derive tractable convex optimization algorithms,
and we present illustrations on multivariate trigonometric polynomials and
functions on the Boolean hypercube.
Related papers
- Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes [70.52097560486683]
We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it.
In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
arXiv Detail & Related papers (2023-11-18T04:41:07Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Optimization and Sampling Under Continuous Symmetry: Examples and Lie
Theory [26.555110725656963]
We show examples of Lieant's theorem, Lie groups, Lie algebras, and the Harish-Chandra--Itzyintegrals formulas.
We then present an introduction to optimization theory -- an indispensable mathematical toolkit for capturing continuous symmetries.
arXiv Detail & Related papers (2021-09-02T16:44:44Z) - Convergence bounds for nonlinear least squares and applications to
tensor recovery [0.0]
We consider the problem of approximating a function in general nonlinear subsets of $L2$ when only a weighted Monte Carlo estimate of the $L2$-norm can be computed.
A critical analysis of our results allows us to derive a sample efficient algorithm for the model set of low-rank tensors.
arXiv Detail & Related papers (2021-08-11T14:14:02Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Connecting Hamilton--Jacobi partial differential equations with maximum
a posteriori and posterior mean estimators for some non-convex priors [0.0]
In this chapter, we consider a certain class non-log-concave regularizations and show that similar representation formulas for the minimizer can also be obtained.
We also present similar results for certain Bayesian posterior mean estimators with Gaussian data fidelity and certain non-log-concave priors using an analogue of min-plus algebra techniques.
arXiv Detail & Related papers (2021-04-22T19:00:37Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Inexact and Stochastic Generalized Conditional Gradient with Augmented
Lagrangian and Proximal Step [2.0196229393131726]
We analyze inexact and versions of the CGALP algorithm developed in the authors' previous paper.
This allows one to compute some gradients, terms, and/or linear minimization oracles in an inexact fashion.
We show convergence of the Lagrangian to an optimum and feasibility of the affine constraint.
arXiv Detail & Related papers (2020-05-11T14:52:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.