Related papers: On the Fundamental Limits of Matrix Completion: Leveraging Hierarchical Similarity Graphs

On the Fundamental Limits of Matrix Completion: Leveraging Hierarchical Similarity Graphs

URL: http://arxiv.org/abs/2109.05408v1
Date: Sun, 12 Sep 2021 02:38:54 GMT
Title: On the Fundamental Limits of Matrix Completion: Leveraging Hierarchical Similarity Graphs
Authors: Junhyung Ahn, Adel Elmahdy, Soheil Mohajer, Changho Suh
Abstract summary: We study the matrix completion problem that leverages hierarchical similarity graphs as side information in recommender systems. Under a hierarchical block model, we characterize the exact information-theoretic limit on the number of observed matrix entries. We analyze the optimal sample complexity and identify different regimes whose characteristics rely on quality metrics of side information of the hierarchical similarity graph.
Score: 39.00971122472004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the matrix completion problem that leverages hierarchical similarity graphs as side information in the context of recommender systems. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model, we characterize the exact information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) by proving sharp upper and lower bounds on the sample complexity. In the achievability proof, we demonstrate that probability of error of the maximum likelihood estimator vanishes for sufficiently large number of users and items, if all sufficient conditions are satisfied. On the other hand, the converse (impossibility) proof is based on the genie-aided maximum likelihood estimator. Under each necessary condition, we present examples of a genie-aided estimator to prove that the probability of error does not vanish for sufficiently large number of users and items. One important consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. More specifically, we analyze the optimal sample complexity and identify different regimes whose characteristics rely on quality metrics of side information of the hierarchical similarity graph. Finally, we present simulation results to corroborate our theoretical findings and show that the characterized information-theoretic limit can be asymptotically achieved.

Related papers

Conditional Distribution Quantization in Machine Learning [83.54039134248231]
Conditional expectation mathbbE(Y mid X) often fails to capture the complexity of multimodal conditional distributions mathcalL(Y mid X) We propose using n-point conditional quantizations--functional mappings of X that are learnable via gradient descent--to approximate mathcalL(Y mid X)
arXiv Detail & Related papers (2025-02-11T00:28:24Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Entry-Specific Bounds for Low-Rank Matrix Completion under Highly Non-Uniform Sampling [10.824999179337558]
We show that it is often better and sometimes optimal to run estimation algorithms on a smaller submatrix rather than the entire matrix. Our bounds characterize the hardness of estimating each entry as a function of the localized sampling probabilities.
arXiv Detail & Related papers (2024-02-29T23:24:43Z)
Optimal and exact recovery on general non-uniform Hypergraph Stochastic Block Model [0.0]
We consider the community detection problem in random hypergraphs under the non-uniform hypergraphinger block model (HSBM) We establish, for the first time in the literature, a sharp threshold for exact recovery under this non-uniform case, subject to minor constraints. We provide two efficient algorithms which successfully achieve exact recovery when above the threshold, and attain the lowest possible ratio when the exact recovery is impossible.
arXiv Detail & Related papers (2023-04-25T20:30:33Z)
Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints. The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution. We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z)
Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold [3.6052935394000234]
Block model is a canonical random graph model for clustering and community detection on network-structured data. No estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below a certain threshold. We prove that with an arbitrary fraction of the labels feasible throughout the parameter domain.
arXiv Detail & Related papers (2022-05-24T00:03:25Z)
Matrix Completion with Hierarchical Graph Side Information [39.00971122472004]
We consider a matrix completion problem that exploits social or item similarity graphs as side information. We develop a universal, parameter-free, and computationally efficient algorithm that starts with hierarchical graph clustering. We conduct extensive experiments on synthetic and real-world datasets to corroborate our theoretical results.
arXiv Detail & Related papers (2022-01-02T03:47:41Z)
Partial Counterfactual Identification from Observational and Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data. Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z)
Probabilistic Simplex Component Analysis [66.30587591100566]
PRISM is a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data. The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning.
arXiv Detail & Related papers (2021-03-18T05:39:00Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
Discrete-Valued Latent Preference Matrix Estimation with Graph Side Information [12.836994708337144]
We develop an algorithm that matches the optimal sample complexity. Our algorithm is robust to model errors and outperforms the existing algorithms in terms of prediction performance.
arXiv Detail & Related papers (2020-03-16T06:29:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.