Related papers: Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

URL: http://arxiv.org/abs/2307.06457v3
Date: Fri, 28 Jul 2023 20:27:20 GMT
Title: Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective
Authors: Max Simchowitz and Abhishek Gupta and Kaiqing Zhang
Abstract summary: We study a setting we call distribution shift, where (a) under the test- and training-random data, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is not covered by the training distribution.
Score: 42.85196869759168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.

Related papers

Surmise for random matrices' level spacing distributions beyond nearest-neighbors [0.0]
Correlations between energy levels can help distinguish whether a many-body system is of integrable or chaotic nature. For nearest-neighbor (NN) spectral spacings, the distribution in random matrices is well captured by the Wigner surmise. We propose a corrected surmise for the $k$NN spectral distributions.
arXiv Detail & Related papers (2025-04-28T18:00:00Z)
Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization [0.4732176352681218]
This paper addresses the challenge of gradual domain adaptation within a class of manifold-constrained data distributions. We propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius. Our bounds rely on a newly introduced it compatibility measure, which fully characterizes the error propagation dynamics along the sequence.
arXiv Detail & Related papers (2024-10-17T22:07:25Z)
Out-Of-Domain Unlabeled Data Improves Generalization [0.7589678255312519]
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems. We show that unlabeled samples can be harnessed to narrow the generalization gap. We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-09-29T02:00:03Z)
An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings [21.727073594338297]
This study is motivated by applications in machine learning and statistics. We establish the weak limit of the empirical distribution of these random matrices in a scaling regime. Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
arXiv Detail & Related papers (2022-05-12T18:50:21Z)
Testing distributional assumptions of learning algorithms [5.204779946147061]
We study the design of tester-learner pairs $(mathcalA,mathcalT)$. We show that if the distribution on examples in the data passes the tester $mathcalT$ then one can safely trust the output of the agnostic $mathcalA$ on the data.
arXiv Detail & Related papers (2022-04-14T19:10:53Z)
Non-Gaussian Component Analysis via Lattice Basis Reduction [56.98280399449707]
Non-Gaussian Component Analysis (NGCA) is a distribution learning problem. We provide an efficient algorithm for NGCA in the regime that $A$ is discrete or nearly discrete.
arXiv Detail & Related papers (2021-12-16T18:38:02Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed. We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z)
Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins [92.7662890047311]
We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
arXiv Detail & Related papers (2020-10-01T16:48:33Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples. We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data. Under a class of statistical models, we provide an exact analysis of the universality error of boosting. We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.