Provable Weak-to-Strong Generalization via Benign Overfitting
- URL: http://arxiv.org/abs/2410.04638v1
- Date: Sun, 6 Oct 2024 22:10:50 GMT
- Title: Provable Weak-to-Strong Generalization via Benign Overfitting
- Authors: David X. Wu, Anant Sahai,
- Abstract summary: We consider the inverted situation, where a weak teacher supervises a strong student with imperfect pseudolabels.
We theoretically investigate weak-to-strong generalization for binary and multilabel classification.
Our techniques should eventually extend to weak-to-strong multiclass classification.
- Score: 3.4652800888823294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The classic teacher-student model in machine learning posits that a strong teacher supervises a weak student to improve the student's capabilities. We instead consider the inverted situation, where a weak teacher supervises a strong student with imperfect pseudolabels. This paradigm was recently brought forth by Burns et al.'23 and termed \emph{weak-to-strong generalization}. We theoretically investigate weak-to-strong generalization for binary and multilabel classification in a stylized overparameterized spiked covariance model with Gaussian covariates where the weak teacher's pseudolabels are asymptotically like random guessing. Under these assumptions, we provably identify two asymptotic phases of the strong student's generalization after weak supervision: (1) successful generalization and (2) random guessing. Our techniques should eventually extend to weak-to-strong multiclass classification. Towards doing so, we prove a tight lower tail inequality for the maximum of correlated Gaussians, which may be of independent interest. Understanding the multilabel setting reinforces the value of using logits for weak supervision when they are available.
Related papers
- Theoretical Analysis of Weak-to-Strong Generalization [23.235671743867492]
We show that existing weak supervision theory fails to account for pseudolabel correction and coverage expansion.
Our bounds capture the intuition that weak-to-strong generalization occurs when the strong model is unable to fit the mistakes of the weak teacher without incurring additional error.
arXiv Detail & Related papers (2024-05-25T03:48:12Z) - Co-Supervised Learning: Improving Weak-to-Strong Generalization with
Hierarchical Mixture of Experts [81.37287967870589]
We propose to harness a diverse set of specialized teachers, instead of a single generalist one, that collectively supervises the strong student.
Our approach resembles the classical hierarchical mixture of experts, with two components tailored for co-supervision.
We validate the proposed method through visual recognition tasks on the OpenAI weak-to-strong benchmark and additional multi-domain datasets.
arXiv Detail & Related papers (2024-02-23T18:56:11Z) - Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning [59.44422468242455]
We propose a novel method dubbed ShrinkMatch to learn uncertain samples.
For each uncertain sample, it adaptively seeks a shrunk class space, which merely contains the original top-1 class.
We then impose a consistency regularization between a pair of strongly and weakly augmented samples in the shrunk space to strive for discriminative representations.
arXiv Detail & Related papers (2023-08-13T14:05:24Z) - Pseudo Contrastive Learning for Graph-based Semi-supervised Learning [67.37572762925836]
Pseudo Labeling is a technique used to improve the performance of Graph Neural Networks (GNNs)
We propose a general framework for GNNs, termed Pseudo Contrastive Learning (PCL)
arXiv Detail & Related papers (2023-02-19T10:34:08Z) - Self-Training with Differentiable Teacher [80.62757989797095]
Self-training achieves enormous success in various semi-supervised and weakly-supervised learning tasks.
The method can be interpreted as a teacher-student framework, where the teacher generates pseudo-labels, and the student makes predictions.
We propose ours, short for differentiable self-training, that treats teacher-student as a Stackelberg game.
arXiv Detail & Related papers (2021-09-15T02:06:13Z) - Distilling Double Descent [65.85258126760502]
Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model.
We show, that, even when the teacher model is highly over parameterized, and provides emphhard labels, using a very large held-out unlabeled dataset can result in a model that outperforms more "traditional" approaches.
arXiv Detail & Related papers (2021-02-13T02:26:48Z) - Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z) - Classifier-independent Lower-Bounds for Adversarial Robustness [13.247278149124757]
We theoretically analyse the limits of robustness to test-time adversarial and noisy examples in classification.
We use optimal transport theory to derive variational formulae for the Bayes-optimal error a classifier can make on a given classification problem.
We derive explicit lower-bounds on the Bayes-optimal error in the case of the popular distance-based attacks.
arXiv Detail & Related papers (2020-06-17T16:46:39Z) - Rethinking the Value of Labels for Improving Class-Imbalanced Learning [20.953282288425118]
Class-imbalanced learning can significantly benefit in both semi-supervised and self-supervised manners.
We argue that imbalanced labels are not useful always.
Our findings highlight the need to rethink the usage of imbalanced labels in realistic long-tailed tasks.
arXiv Detail & Related papers (2020-06-13T01:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.