Out-Of-Domain Unlabeled Data Improves Generalization
- URL: http://arxiv.org/abs/2310.00027v2
- Date: Thu, 15 Feb 2024 18:23:41 GMT
- Title: Out-Of-Domain Unlabeled Data Improves Generalization
- Authors: Amir Hossein Saberi, Amir Najafi, Alireza Heidari, Mohammad Hosein
Movasaghinia, Abolfazl Motahari, Babak H. Khalaj
- Abstract summary: We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems.
We show that unlabeled samples can be harnessed to narrow the generalization gap.
We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
- Score: 0.7589678255312519
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel framework for incorporating unlabeled data into
semi-supervised classification problems, where scenarios involving the
minimization of either i) adversarially robust or ii) non-robust loss functions
have been considered. Notably, we allow the unlabeled samples to deviate
slightly (in total variation sense) from the in-domain distribution. The core
idea behind our framework is to combine Distributionally Robust Optimization
(DRO) with self-supervised training. As a result, we also leverage efficient
polynomial-time algorithms for the training stage. From a theoretical
standpoint, we apply our framework on the classification problem of a mixture
of two Gaussians in $\mathbb{R}^d$, where in addition to the $m$ independent
and labeled samples from the true distribution, a set of $n$ (usually with
$n\gg m$) out of domain and unlabeled samples are given as well. Using only the
labeled data, it is known that the generalization error can be bounded by
$\propto\left(d/m\right)^{1/2}$. However, using our method on both isotropic
and non-isotropic Gaussian mixture models, one can derive a new set of
analytically explicit and non-asymptotic bounds which show substantial
improvement on the generalization error compared to ERM. Our results underscore
two significant insights: 1) out-of-domain samples, even when unlabeled, can be
harnessed to narrow the generalization gap, provided that the true data
distribution adheres to a form of the ``cluster assumption", and 2) the
semi-supervised learning paradigm can be regarded as a special case of our
framework when there are no distributional shifts. We validate our claims
through experiments conducted on a variety of synthetic and real-world
datasets.
Related papers
- Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization [0.4732176352681218]
This paper addresses the challenge of gradual domain adaptation within a class of manifold-constrained data distributions.
We propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius.
Our bounds rely on a newly introduced it compatibility measure, which fully characterizes the error propagation dynamics along the sequence.
arXiv Detail & Related papers (2024-10-17T22:07:25Z) - Classification of Data Generated by Gaussian Mixture Models Using Deep
ReLU Networks [28.437011792990347]
This paper studies the binary classification of data from $math RMs. generated under Gaussian Mixture networks.
We obtain $d2013x neural analysis rates for the first time convergence rates.
Results provide a theoretical verification of deep neural networks in practical classification problems.
arXiv Detail & Related papers (2023-08-15T20:40:42Z) - Tackling Combinatorial Distribution Shift: A Matrix Completion
Perspective [42.85196869759168]
We study a setting we call distribution shift, where (a) under the test- and training-random data, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is not covered by the training distribution.
arXiv Detail & Related papers (2023-07-12T21:17:47Z) - Data thinning for convolution-closed distributions [2.299914829977005]
We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation.
We show that data thinning can be used to validate the results of unsupervised learning approaches.
arXiv Detail & Related papers (2023-01-18T02:47:41Z) - How Does Pseudo-Labeling Affect the Generalization Error of the
Semi-Supervised Gibbs Algorithm? [73.80001705134147]
We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm.
The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset.
arXiv Detail & Related papers (2022-10-15T04:11:56Z) - A Unified Joint Maximum Mean Discrepancy for Domain Adaptation [73.44809425486767]
This paper theoretically derives a unified form of JMMD that is easy to optimize.
From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence that benefits to classification.
We propose a novel MMD matrix to promote the dependence, and devise a novel label kernel that is robust to label distribution shift.
arXiv Detail & Related papers (2021-01-25T09:46:14Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Self-training Avoids Using Spurious Features Under Domain Shift [54.794607791641745]
In unsupervised domain adaptation, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory.
We identify and analyze one particular setting where the domain shift can be large, but certain spurious features correlate with label in the source domain but are independent label in the target.
arXiv Detail & Related papers (2020-06-17T17:51:42Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.