The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications
for Profile Maximum Likelihood
- URL: http://arxiv.org/abs/2004.02425v1
- Date: Mon, 6 Apr 2020 06:40:03 GMT
- Title: The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications
for Profile Maximum Likelihood
- Authors: Nima Anari, Moses Charikar, Kirankumar Shiragur, Aaron Sidford
- Abstract summary: We provide new bounds on the quality of approximation of the Bethe and Sinkhorn permanents.
We establish a surprising connection between the convex relaxation in prior work and the well-studied Bethe and Sinkhorn approximations.
- Score: 33.51964370430905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we consider the problem of computing the likelihood of the
profile of a discrete distribution, i.e., the probability of observing the
multiset of element frequencies, and computing a profile maximum likelihood
(PML) distribution, i.e., a distribution with the maximum profile likelihood.
For each problem we provide polynomial time algorithms that given $n$ i.i.d.\
samples from a discrete distribution, achieve an approximation factor of
$\exp\left(-O(\sqrt{n} \log n) \right)$, improving upon the previous best-known
bound achievable in polynomial time of $\exp(-O(n^{2/3} \log n))$ (Charikar,
Shiragur and Sidford, 2019). Through the work of Acharya, Das, Orlitsky and
Suresh (2016), this implies a polynomial time universal estimator for symmetric
properties of discrete distributions in a broader range of error parameter.
We achieve these results by providing new bounds on the quality of
approximation of the Bethe and Sinkhorn permanents (Vontobel, 2012 and 2014).
We show that each of these are $\exp(O(k \log(N/k)))$ approximations to the
permanent of $N \times N$ matrices with non-negative rank at most $k$,
improving upon the previous known bounds of $\exp(O(N))$. To obtain our results
on PML, we exploit the fact that the PML objective is proportional to the
permanent of a certain Vandermonde matrix with $\sqrt{n}$ distinct columns,
i.e. with non-negative rank at most $\sqrt{n}$. As a by-product of our work we
establish a surprising connection between the convex relaxation in prior work
(CSS19) and the well-studied Bethe and Sinkhorn approximations.
Related papers
- $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Robust Mean Estimation Without Moments for Symmetric Distributions [7.105512316884493]
We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently.
We propose a sequence of efficient algorithms that approaches this optimal error.
Our algorithms are based on a generalization of the well-known filtering technique.
arXiv Detail & Related papers (2023-02-21T17:52:23Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Optimal Sublinear Sampling of Spanning Trees and Determinantal Point
Processes via Average-Case Entropic Independence [3.9586758145580014]
We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions.
For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $widetildeO(lvert Vrvert)$ time per sample.
For a determinantal point process on subsets of size $k$ of a ground set of $n$ elements, we show how to approximately sample in $widetildeO(komega)$ time after an initial $widetildeO(nk
arXiv Detail & Related papers (2022-04-06T04:11:26Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Approximating Pandora's Box with Correlations [14.284880952600995]
We study the complexity of approximating the optimal policy which may adaptively choose which box to visit next.
Our main result establishes an approximation-preserving equivalence of PB to the well studied Uniform Decision Tree (UDT) problem.
We also study the case where the distribution over values is given more succinctly as a mixture of $m$ product distributions.
arXiv Detail & Related papers (2021-08-30T03:32:16Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - List-Decodable Subspace Recovery: Dimension Independent Error in
Polynomial Time [5.812499828391904]
In list-decodable subspace recovery, the input is a collection of $n$ points $alpha n$ (for some $alpha ll 1/2$) of which are drawn i.i.d. from a distribution $mathcalD$.
In this work, we improve on results on all three fronts: emphcertifiable anti-concentration error via a faster fixed-polynomial running time.
arXiv Detail & Related papers (2020-02-12T18:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.