Likelihood estimation of sparse topic distributions in topic models and
its applications to Wasserstein document distance calculations
- URL: http://arxiv.org/abs/2107.05766v1
- Date: Mon, 12 Jul 2021 22:22:32 GMT
- Title: Likelihood estimation of sparse topic distributions in topic models and
its applications to Wasserstein document distance calculations
- Authors: Xin Bing and Florentina Bunea and Seth Strimas-Mackey and Marten
Wegkamp
- Abstract summary: In topic models, the $ptimes n$ expected word frequency matrix is assumed to be factorized as a $ptimes K$ word-topic matrix $A$.
columns of $A$ are viewed as $p$-dimensional mixture components that are common to all documents.
When $A$ is unknown, we estimate $T$ by optimizing the likelihood function corresponding to a plug in, generic, estimator $hatA$ of $A$.
- Score: 3.679981089267181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the estimation of high-dimensional, discrete, possibly
sparse, mixture models in topic models. The data consists of observed
multinomial counts of $p$ words across $n$ independent documents. In topic
models, the $p\times n$ expected word frequency matrix is assumed to be
factorized as a $p\times K$ word-topic matrix $A$ and a $K\times n$
topic-document matrix $T$. Since columns of both matrices represent conditional
probabilities belonging to probability simplices, columns of $A$ are viewed as
$p$-dimensional mixture components that are common to all documents while
columns of $T$ are viewed as the $K$-dimensional mixture weights that are
document specific and are allowed to be sparse. The main interest is to provide
sharp, finite sample, $\ell_1$-norm convergence rates for estimators of the
mixture weights $T$ when $A$ is either known or unknown. For known $A$, we
suggest MLE estimation of $T$. Our non-standard analysis of the MLE not only
establishes its $\ell_1$ convergence rate, but reveals a remarkable property:
the MLE, with no extra regularization, can be exactly sparse and contain the
true zero pattern of $T$. We further show that the MLE is both minimax optimal
and adaptive to the unknown sparsity in a large class of sparse topic
distributions. When $A$ is unknown, we estimate $T$ by optimizing the
likelihood function corresponding to a plug in, generic, estimator $\hat{A}$ of
$A$. For any estimator $\hat{A}$ that satisfies carefully detailed conditions
for proximity to $A$, the resulting estimator of $T$ is shown to retain the
properties established for the MLE. The ambient dimensions $K$ and $p$ are
allowed to grow with the sample sizes. Our application is to the estimation of
1-Wasserstein distances between document generating distributions. We propose,
estimate and analyze new 1-Wasserstein distances between two probabilistic
document representations.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Revisiting Step-Size Assumptions in Stochastic Approximation [1.3654846342364308]
The paper revisits step-size selection in a general Markovian setting.
A major conclusion is that the choice of $rho =0$ or even $rho1/2$ is justified only in select settings.
arXiv Detail & Related papers (2024-05-28T05:11:05Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Chi-square and normal inference in high-dimensional multi-task
regression [7.310043452300736]
The paper proposes chi-square and normal methodologies for the unknown coefficient matrix $B*$ of size $ptimes T$ in a Multi-Task (MT) linear model.
arXiv Detail & Related papers (2021-07-16T11:19:49Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z) - Optimal estimation of sparse topic models [3.308743964406688]
This paper studies the estimation of $A$ that is possibly element-wise sparse, and the number of topics $K$ is unknown.
We derive a new minimax lower bound for the estimation of such $A$ and propose a new computationally efficient algorithm for its recovery.
Our estimate adapts to the unknown sparsity of $A$ and our analysis is valid for any finite $n$, $p$, $K$ and document lengths.
arXiv Detail & Related papers (2020-01-22T03:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.