Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information
- URL: http://arxiv.org/abs/2511.13049v2
- Date: Fri, 21 Nov 2025 12:07:39 GMT
- Title: Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information
- Authors: Antoine Ledent, Mun Chong Soo, Nong Minh Hieu,
- Abstract summary: We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices.<n>We show that the true generalization error splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $ground$ respectively.
- Score: 14.149880038429485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \textit{share a common subspace}. We assume that a large amount $M$ of \textit{unlabeled} data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of labeled data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to `implicit feedback' (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $\widetilde{O}\left(\sqrt{\frac{nd}{M}}\right)$ and $\widetilde{O}\left(\sqrt{\frac{dr}{N}}\right)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $\ground$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.
Related papers
- Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z) - Harmful Overfitting in Sobolev Spaces [49.47221415754556]
We study the generalization behavior of functions in Sobolev spaces $Wk, p(mathbbRd)$ that perfectly fit a noisy training data set.<n>We show that approximately norm-minimizing interpolators, which are canonical solutions selected by smoothness bias, exhibit harmful overfitting.<n>Our proof uses a geometric argument which identifies harmful neighborhoods of the training data using Sobolev inequalities.
arXiv Detail & Related papers (2026-01-31T17:40:56Z) - One-Sided Matrix Completion from Ultra-Sparse Samples [18.28432289831719]
We propose an unbiased estimator that normalizes each nonzero entry of the second moment by its observed frequency.<n>We show that the estimator is unbiased for any $p$ and enjoys low variance.<n>Experiments on both synthetic and real-world data validate our approach.
arXiv Detail & Related papers (2026-01-18T01:33:48Z) - Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [72.69498649272347]
conditional distributions is a central problem in machine learning.<n>We propose a new paradigm that integrates both paired and unpaired data.<n>We show that our approach can theoretically recover true conditional distributions with arbitrarily small error.
arXiv Detail & Related papers (2024-10-03T16:12:59Z) - Semidefinite programming relaxations and debiasing for MAXCUT-based clustering [1.9761774213809036]
We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of 2 sub-gaussian distributions in $mathbbRp$.<n>We use semidefinite programming relaxations of an integer quadratic program that is formulated as finding the maximum cut on a graph.
arXiv Detail & Related papers (2024-01-16T03:14:24Z) - Out-Of-Domain Unlabeled Data Improves Generalization [0.7589678255312519]
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems.
We show that unlabeled samples can be harnessed to narrow the generalization gap.
We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-09-29T02:00:03Z) - Kernel-Based Tests for Likelihood-Free Hypothesis Testing [21.143798051525646]
Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to emphone of the two classes.
Special cases of this problem are well-known; when $m=1$ it corresponds to binary classification; and when $mapprox n$ it is equivalent to two-sample testing.
In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed.
arXiv Detail & Related papers (2023-08-17T15:24:03Z) - Tackling Combinatorial Distribution Shift: A Matrix Completion
Perspective [42.85196869759168]
We study a setting we call distribution shift, where (a) under the test- and training-random data, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is not covered by the training distribution.
arXiv Detail & Related papers (2023-07-12T21:17:47Z) - How Does Pseudo-Labeling Affect the Generalization Error of the
Semi-Supervised Gibbs Algorithm? [73.80001705134147]
We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm.
The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset.
arXiv Detail & Related papers (2022-10-15T04:11:56Z) - Low-rank Matrix Recovery With Unknown Correspondence [62.634051913953485]
We show that it is possible to recover $M$ via solving a nuclear norm minimization problem under a proper low-rank condition on $M$, with provable non-asymptotic error bound for the recovery of $M$.
Experiments on simulated data, the MovieLens 100K dataset and Yale B database show that $textM3textO achieves state-of-the-art performance over several baselines and can recover the ground-truth correspondence with high accuracy.
arXiv Detail & Related papers (2021-10-15T09:27:50Z) - Leveraged Matrix Completion with Noise [84.20092979053119]
We show that we can provably recover an unknown $ntimes n$ matrix of rank $r$ from just about $mathcalO(nrlog2 (n))$ entries.
Our proofs are supported by a novel approach that phrases sufficient optimality conditions based on the Golfing Scheme.
arXiv Detail & Related papers (2020-11-11T16:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.