On the Fundamental Limits of Matrix Completion: Leveraging Hierarchical
Similarity Graphs
- URL: http://arxiv.org/abs/2109.05408v1
- Date: Sun, 12 Sep 2021 02:38:54 GMT
- Title: On the Fundamental Limits of Matrix Completion: Leveraging Hierarchical
Similarity Graphs
- Authors: Junhyung Ahn, Adel Elmahdy, Soheil Mohajer, Changho Suh
- Abstract summary: We study the matrix completion problem that leverages hierarchical similarity graphs as side information in recommender systems.
Under a hierarchical block model, we characterize the exact information-theoretic limit on the number of observed matrix entries.
We analyze the optimal sample complexity and identify different regimes whose characteristics rely on quality metrics of side information of the hierarchical similarity graph.
- Score: 39.00971122472004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the matrix completion problem that leverages hierarchical similarity
graphs as side information in the context of recommender systems. Under a
hierarchical stochastic block model that well respects practically-relevant
social graphs and a low-rank rating matrix model, we characterize the exact
information-theoretic limit on the number of observed matrix entries (i.e.,
optimal sample complexity) by proving sharp upper and lower bounds on the
sample complexity. In the achievability proof, we demonstrate that probability
of error of the maximum likelihood estimator vanishes for sufficiently large
number of users and items, if all sufficient conditions are satisfied. On the
other hand, the converse (impossibility) proof is based on the genie-aided
maximum likelihood estimator. Under each necessary condition, we present
examples of a genie-aided estimator to prove that the probability of error does
not vanish for sufficiently large number of users and items. One important
consequence of this result is that exploiting the hierarchical structure of
social graphs yields a substantial gain in sample complexity relative to the
one that simply identifies different groups without resorting to the relational
structure across them. More specifically, we analyze the optimal sample
complexity and identify different regimes whose characteristics rely on quality
metrics of side information of the hierarchical similarity graph. Finally, we
present simulation results to corroborate our theoretical findings and show
that the characterized information-theoretic limit can be asymptotically
achieved.
Related papers
- Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Entry-Specific Bounds for Low-Rank Matrix Completion under Highly
Non-Uniform Sampling [10.824999179337558]
We show that it is often better and sometimes optimal to run estimation algorithms on a smaller submatrix rather than the entire matrix.
Our bounds characterize the hardness of estimating each entry as a function of the localized sampling probabilities.
arXiv Detail & Related papers (2024-02-29T23:24:43Z) - Optimal and exact recovery on general non-uniform Hypergraph Stochastic Block Model [0.0]
We consider the community detection problem in random hypergraphs under the non-uniform hypergraphinger block model (HSBM)
We establish, for the first time in the literature, a sharp threshold for exact recovery under this non-uniform case, subject to minor constraints.
We provide two efficient algorithms which successfully achieve exact recovery when above the threshold, and attain the lowest possible ratio when the exact recovery is impossible.
arXiv Detail & Related papers (2023-04-25T20:30:33Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Semi-Supervised Clustering of Sparse Graphs: Crossing the
Information-Theoretic Threshold [3.6052935394000234]
Block model is a canonical random graph model for clustering and community detection on network-structured data.
No estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below a certain threshold.
We prove that with an arbitrary fraction of the labels feasible throughout the parameter domain.
arXiv Detail & Related papers (2022-05-24T00:03:25Z) - Matrix Completion with Hierarchical Graph Side Information [39.00971122472004]
We consider a matrix completion problem that exploits social or item similarity graphs as side information.
We develop a universal, parameter-free, and computationally efficient algorithm that starts with hierarchical graph clustering.
We conduct extensive experiments on synthetic and real-world datasets to corroborate our theoretical results.
arXiv Detail & Related papers (2022-01-02T03:47:41Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Probabilistic Simplex Component Analysis [66.30587591100566]
PRISM is a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data.
The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning.
arXiv Detail & Related papers (2021-03-18T05:39:00Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Discrete-Valued Latent Preference Matrix Estimation with Graph Side
Information [12.836994708337144]
We develop an algorithm that matches the optimal sample complexity.
Our algorithm is robust to model errors and outperforms the existing algorithms in terms of prediction performance.
arXiv Detail & Related papers (2020-03-16T06:29:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.