Tackling Combinatorial Distribution Shift: A Matrix Completion
Perspective
- URL: http://arxiv.org/abs/2307.06457v3
- Date: Fri, 28 Jul 2023 20:27:20 GMT
- Title: Tackling Combinatorial Distribution Shift: A Matrix Completion
Perspective
- Authors: Max Simchowitz and Abhishek Gupta and Kaiqing Zhang
- Abstract summary: We study a setting we call distribution shift, where (a) under the test- and training-random data, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is not covered by the training distribution.
- Score: 42.85196869759168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Obtaining rigorous statistical guarantees for generalization under
distribution shift remains an open and active research area. We study a setting
we call combinatorial distribution shift, where (a) under the test- and
training-distributions, the labels $z$ are determined by pairs of features
$(x,y)$, (b) the training distribution has coverage of certain marginal
distributions over $x$ and $y$ separately, but (c) the test distribution
involves examples from a product distribution over $(x,y)$ that is {not}
covered by the training distribution. Focusing on the special case where the
labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z
\mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$, we aim to
extrapolate to a test distribution domain that is $not$ covered in training,
i.e., achieving bilinear combinatorial extrapolation.
Our setting generalizes a special case of matrix completion from
missing-not-at-random data, for which all existing results require the
ground-truth matrices to be either exactly low-rank, or to exhibit very sharp
spectral cutoffs. In this work, we develop a series of theoretical results that
enable bilinear combinatorial extrapolation under gradual spectral decay as
observed in typical high-dimensional data, including novel algorithms,
generalization guarantees, and linear-algebraic results. A key tool is a novel
perturbation bound for the rank-$k$ singular value decomposition approximations
between two matrices that depends on the relative spectral gap rather than the
absolute spectral gap, a result that may be of broader independent interest.
Related papers
- Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization [0.4732176352681218]
This paper addresses the challenge of gradual domain adaptation within a class of manifold-constrained data distributions.
We propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius.
Our bounds rely on a newly introduced it compatibility measure, which fully characterizes the error propagation dynamics along the sequence.
arXiv Detail & Related papers (2024-10-17T22:07:25Z) - Out-Of-Domain Unlabeled Data Improves Generalization [0.7589678255312519]
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems.
We show that unlabeled samples can be harnessed to narrow the generalization gap.
We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-09-29T02:00:03Z) - An Equivalence Principle for the Spectrum of Random Inner-Product Kernel
Matrices with Polynomial Scalings [21.727073594338297]
This study is motivated by applications in machine learning and statistics.
We establish the weak limit of the empirical distribution of these random matrices in a scaling regime.
Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
arXiv Detail & Related papers (2022-05-12T18:50:21Z) - Testing distributional assumptions of learning algorithms [5.204779946147061]
We study the design of tester-learner pairs $(mathcalA,mathcalT)$.
We show that if the distribution on examples in the data passes the tester $mathcalT$ then one can safely trust the output of the agnostic $mathcalA$ on the data.
arXiv Detail & Related papers (2022-04-14T19:10:53Z) - Non-Gaussian Component Analysis via Lattice Basis Reduction [56.98280399449707]
Non-Gaussian Component Analysis (NGCA) is a distribution learning problem.
We provide an efficient algorithm for NGCA in the regime that $A$ is discrete or nearly discrete.
arXiv Detail & Related papers (2021-12-16T18:38:02Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins [92.7662890047311]
We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
arXiv Detail & Related papers (2020-10-01T16:48:33Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.