Related papers: Exact Fractional Inference via Re-Parametrization & Interpolation between Tree-Re-Weighted- and Belief Propagation- Algorithms

Exact Fractional Inference via Re-Parametrization & Interpolation between Tree-Re-Weighted- and Belief Propagation- Algorithms

URL: http://arxiv.org/abs/2301.10369v2
Date: Wed, 6 Mar 2024 15:25:57 GMT
Title: Exact Fractional Inference via Re-Parametrization & Interpolation between Tree-Re-Weighted- and Belief Propagation- Algorithms
Authors: Hamidreza Behjoo, Michael Chertkov
Abstract summary: Inference efforts -- required to compute partition function, $Z$, of an Ising model over a graph of $N$ spins" -- are most likely exponential in $N$. We show how to express $Z$ as a product, $forall lambda: Z=Z(lambda)cal Z(lambda)$, where the multiplicative correction, $cal Z(lambda)$, is an expectation over a node-independent probability distribution built from node-wise fractional marginals.
Score: 0.5348370085388683
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inference efforts -- required to compute partition function, $Z$, of an Ising model over a graph of $N$ ``spins" -- are most likely exponential in $N$. Efficient variational methods, such as Belief Propagation (BP) and Tree Re-Weighted (TRW) algorithms, compute $Z$ approximately minimizing respective (BP- or TRW-) free energy. We generalize the variational scheme building a $\lambda$-fractional-homotopy, $Z^{(\lambda)}$, where $\lambda=0$ and $\lambda=1$ correspond to TRW- and BP-approximations, respectively, and $Z^{(\lambda)}$ decreases with $\lambda$ monotonically. Moreover, this fractional scheme guarantees that in the attractive (ferromagnetic) case $Z^{(TRW)}\geq Z^{(\lambda)}\geq Z^{(BP)}$, and there exists a unique (``exact") $\lambda_*$ such that, $Z=Z^{(\lambda_*)}$. Generalizing the re-parametrization approach of \citep{wainwright_tree-based_2002} and the loop series approach of \citep{chertkov_loop_2006}, we show how to express $Z$ as a product, $\forall \lambda:\ Z=Z^{(\lambda)}{\cal Z}^{(\lambda)}$, where the multiplicative correction, ${\cal Z}^{(\lambda)}$, is an expectation over a node-independent probability distribution built from node-wise fractional marginals. Our theoretical analysis is complemented by extensive experiments with models from Ising ensembles over planar and random graphs of medium- and large- sizes. The empirical study yields a number of interesting observations, such as (a) ability to estimate ${\cal Z}^{(\lambda)}$ with $O(N^4)$ fractional samples; (b) suppression of $\lambda_*$ fluctuations with increase in $N$ for instances from a particular random Ising ensemble.

Related papers

Improved sampling algorithms and Poincaré inequalities for non-log-concave distributions [1.9753732769115382]
We show that the query complexity of sampling from $mu$ can be $mathrmpoly(L,d)cdot left(fracLMdepsilon2right)mathcalO(L+1)$, which is in $d$ and $frac1epsilon$ when $L=mathcalO(1)$ and $M=mathrmpoly(d)$.
arXiv Detail & Related papers (2025-07-15T12:06:11Z)
On the $O(\rac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm [54.28350823319057]
This paper establishes the convergence rate $frac1Ksum_k=1KEleft[|nabla f(xk)|_1right]leq O(fracsqrtdCK1/4) for AdamW measured by $ell_$ norm, where $K$ represents the iteration number, $d denotes the model dimension, and $C$ matches the constant in the optimal convergence rate of SGD.
arXiv Detail & Related papers (2025-05-17T05:02:52Z)
Decomposition of RSA modulus applying even order elliptic curves [0.0]
An efficient integer factorization algorithm would reduce the security of all variants of the RSA cryptographic scheme to zero. We demonstrate how a natural extension of the generalized approach to smoothness, combined with the separation of $2$-adic point orders, leads us to propose a factoring algorithm.
arXiv Detail & Related papers (2025-03-02T16:09:07Z)
Revisiting Step-Size Assumptions in Stochastic Approximation [1.3654846342364308]
It is shown for the first time that this assumption is not required for convergence and finer results.<n>Rates of convergence are obtained for the standard algorithm and for estimates obtained via the averaging technique of Polyak and Ruppert.<n>Results from numerical experiments illustrate that $beta_theta$ may be large due to a combination of multiplicative noise and Markovian memory.
arXiv Detail & Related papers (2024-05-28T05:11:05Z)
Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise [1.068128849363198]
We study approximation algorithms with Markovian noise and constant step-size $alpha$. We show that the time-averaged bias is equal to $alpha V + O(alpha2)$, where $V$ is a constant characterized by a Lyapunov equation. We also show that $bartheta_n$ converges with high probability around $theta*+alpha V$.
arXiv Detail & Related papers (2024-05-23T08:00:45Z)
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T. Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$. Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Quantum-information theory of a Dirichlet ring with Aharonov-Bohm field [0.0]
Shannon information entropies $S_rho,gamma$, Fisher informations $I_rho,gamma$, Onicescu energies $O_rho,gamma$ and R'enyi entropies $R_rho,gamma(alpha)$ are calculated.
arXiv Detail & Related papers (2022-02-09T19:26:56Z)
Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance. We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices. We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z)
$k$-Forrelation Optimally Separates Quantum and Classical Query Complexity [3.4984289152418753]
We show that any partial function on $N$ bits can be computed with an advantage $delta$ over a random guess by making $q$ quantum queries. We also conjectured the $k$-Forrelation problem -- a partial function that can be computed with $q = lceil k/2 rceil$ quantum queries.
arXiv Detail & Related papers (2020-08-16T21:26:46Z)
Variational Orthogonal Features [29.636483122130027]
We show that for certain priors, features can be defined that remove the $mathcalO(M3)$ cost of computing a minibatch estimate of an evidence lower bound (ELBO) We present a construction of features for any stationary prior kernel that allow for computation of an unbiased estimator to the ELBO using $T$ Monte Carlo samples in $mathcalO(tildeNT+M2T)$ and in $mathcalO(tildeNT+MT)$ with an additional approximation.
arXiv Detail & Related papers (2020-06-23T17:18:07Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model. Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.