For self-supervised learning, Rationality implies generalization,
provably
- URL: http://arxiv.org/abs/2010.08508v1
- Date: Fri, 16 Oct 2020 17:07:52 GMT
- Title: For self-supervised learning, Rationality implies generalization,
provably
- Authors: Yamini Bansal, Gal Kaplun, Boaz Barak
- Abstract summary: We prove a new upper bound on the generalization gap of classifiers obtained by first using self-supervision.
We show that our bound is non-vacuous for many popular representation-learning based classifiers on CIFAR-10 and ImageNet.
- Score: 13.526562756159809
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove a new upper bound on the generalization gap of classifiers that are
obtained by first using self-supervision to learn a representation $r$ of the
training data, and then fitting a simple (e.g., linear) classifier $g$ to the
labels. Specifically, we show that (under the assumptions described below) the
generalization gap of such classifiers tends to zero if $\mathsf{C}(g) \ll n$,
where $\mathsf{C}(g)$ is an appropriately-defined measure of the simple
classifier $g$'s complexity, and $n$ is the number of training samples. We
stress that our bound is independent of the complexity of the representation
$r$. We do not make any structural or conditional-independence assumptions on
the representation-learning task, which can use the same training dataset that
is later used for classification. Rather, we assume that the training procedure
satisfies certain natural noise-robustness (adding small amount of label noise
causes small degradation in performance) and rationality (getting the wrong
label is not better than getting no label at all) conditions that widely hold
across many standard architectures. We show that our bound is non-vacuous for
many popular representation-learning based classifiers on CIFAR-10 and
ImageNet, including SimCLR, AMDIM and MoCo.
Related papers
- Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner.
We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative.
We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization [18.427215139020625]
We show that the best classification is achieved when $f(cdot) = |cdot|2$ and $lambda to infty$.
It is often possible to find sparse and one-bit solutions that perform almost as well as one corresponding to $f(cdot) = |cdot|_infty$ in the large $lambda$ regime.
arXiv Detail & Related papers (2024-02-16T06:39:40Z) - A Novel Approach to Regularising 1NN classifier for Improved
Generalization [3.9919322607068293]
We show that watershed classifiers can find arbitrary boundaries on any dense enough dataset, and, at the same time, have very small VC dimension.
We propose a loss function which can learn representations consistent with watershed classifiers, and show that it outperforms the NCA baseline.
arXiv Detail & Related papers (2024-02-13T12:09:15Z) - Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data [4.971690889257356]
We introduce an adaptation of the alternating minimization-descent scheme proposed by Collins and Nayer and Vaswani.
We show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data.
Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications.
arXiv Detail & Related papers (2023-08-08T17:56:20Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - On the Provable Advantage of Unsupervised Pretraining [26.065736182939222]
Unsupervised pretraining is a critical component of modern large-scale machine learning systems.
This paper studies a generic framework, where the unsupervised representation learning task is specified by an abstract class of latent variable models.
Under a mild ''informative'' condition, our algorithm achieves an excess risk of $tildemathcalO(sqrtmathcalC_Phi/m + sqrtmathcalC_Psi/n)$ for downstream tasks.
arXiv Detail & Related papers (2023-03-02T20:42:05Z) - Class Prototype-based Cleaner for Label Noise Learning [73.007001454085]
Semi-supervised learning methods are current SOTA solutions to the noisy-label learning problem.
We propose a simple yet effective solution, named textbfClass textbfPrototype-based label noise textbfCleaner.
arXiv Detail & Related papers (2022-12-21T04:56:41Z) - Blessing of Class Diversity in Pre-training [54.335530406959435]
We prove that when the classes of the pre-training task are sufficiently diverse, pre-training can significantly improve the sample efficiency of downstream tasks.
Our proof relies on a vector-form Rademacher complexity chain rule for composite function classes and a modified self-concordance condition.
arXiv Detail & Related papers (2022-09-07T20:10:12Z) - Counterfactual Zero-Shot and Open-Set Visual Recognition [95.43275761833804]
We present a novel counterfactual framework for both Zero-Shot Learning (ZSL) and Open-Set Recognition (OSR)
Our idea stems from the observation that the generated samples for unseen-classes are often out of the true distribution.
We demonstrate that our framework effectively mitigates the seen/unseen imbalance and hence significantly improves the overall performance.
arXiv Detail & Related papers (2021-03-01T10:20:04Z) - Classification with Strategically Withheld Data [41.78264347024645]
Machine learning techniques can be useful in applications such as credit approval and college admission.
To be classified more favorably in such contexts, an agent may decide to strategically withhold some of her features, such as bad test scores.
We design three classification methods: sc Mincut, sc Hill-Climbing (sc HC) and Incentive- Logistic Regression (sc-LR)
arXiv Detail & Related papers (2020-12-18T12:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.