Co-Supervised Learning: Improving Weak-to-Strong Generalization with
  Hierarchical Mixture of Experts
        - URL: http://arxiv.org/abs/2402.15505v1
- Date: Fri, 23 Feb 2024 18:56:11 GMT
- Title: Co-Supervised Learning: Improving Weak-to-Strong Generalization with
  Hierarchical Mixture of Experts
- Authors: Yuejiang Liu, Alexandre Alahi
- Abstract summary: We propose to harness a diverse set of specialized teachers, instead of a single generalist one, that collectively supervises the strong student.
Our approach resembles the classical hierarchical mixture of experts, with two components tailored for co-supervision.
We validate the proposed method through visual recognition tasks on the OpenAI weak-to-strong benchmark and additional multi-domain datasets.
- Score: 81.37287967870589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Steering the behavior of a strong model pre-trained on internet-scale data
can be difficult due to the scarcity of competent supervisors. Recent studies
reveal that, despite supervisory noises, a strong student model may surpass its
weak teacher when fine-tuned on specific objectives. Yet, the effectiveness of
such weak-to-strong generalization remains limited, especially in the presence
of large capability gaps. In this paper, we propose to address this challenge
by harnessing a diverse set of specialized teachers, instead of a single
generalist one, that collectively supervises the strong student. Our approach
resembles the classical hierarchical mixture of experts, with two components
tailored for co-supervision: (i) we progressively alternate student training
and teacher assignment, leveraging the growth of the strong student to identify
plausible supervisions; (ii) we conservatively enforce teacher-student and
local-global consistency, leveraging their dependencies to reject potential
annotation noises. We validate the proposed method through visual recognition
tasks on the OpenAI weak-to-strong benchmark and additional multi-domain
datasets. Our code is available at \url{https://github.com/yuejiangliu/csl}.
 
      
        Related papers
        - On the Mechanisms of Weak-to-Strong Generalization: A Theoretical   Perspective [28.005935031887038]
 Weak-to-strong generalization, where a student model trained on imperfect labels surpasses that teacher, has been widely observed.<n>In this paper, through a theoretical analysis of simple models, we uncover three core mechanisms that can drive this phenomenon.
 arXiv  Detail & Related papers  (2025-05-23T20:09:09Z)
- Alice: Proactive Learning with Teacher's Demonstrations for   Weak-to-Strong Generalization [69.96794098855938]
 Weak-to-strong generalization (W2SG) offers a promising framework for supervising increasingly capable language models (LLMs)
Traditional W2SG methods rely on passive learning, where a weak teacher provides noisy demonstrations to train a strong student.
We introduce Alice, a framework that leverages complementary knowledge between teacher and student to enhance the learning process.
 arXiv  Detail & Related papers  (2025-04-09T22:33:06Z)
- Understanding the Capabilities and Limitations of Weak-to-Strong   Generalization [40.793180521446466]
 We provide theoretical insights into weak-to-strong generalization.
We show that the weak model should demonstrate strong generalization performance and maintain well-calibrated predictions.
We extend the work of Charikar et al. (2024) to a loss function based on Kullback-Leibler divergence.
 arXiv  Detail & Related papers  (2025-02-03T15:48:28Z)
- Provable Weak-to-Strong Generalization via Benign Overfitting [3.4652800888823294]
 We consider the inverted situation, where a weak teacher supervises a strong student with imperfect pseudolabels.
We theoretically investigate weak-to-strong generalization for binary and multilabel classification.
Our techniques should eventually extend to weak-to-strong multiclass classification.
 arXiv  Detail & Related papers  (2024-10-06T22:10:50Z)
- Adaptive Teaching in Heterogeneous Agents: Balancing Surprise in Sparse   Reward Scenarios [3.638198517970729]
 Learning from Demonstration can be an efficient way to train systems with analogous agents.
However, naively replicating demonstrations that are out of bounds for the Student's capability can limit efficient learning.
We present a Teacher-Student learning framework specifically tailored to address the challenge of heterogeneity between the Teacher and Student agents.
 arXiv  Detail & Related papers  (2024-05-23T05:52:42Z)
- Vision Superalignment: Weak-to-Strong Generalization for Vision
  Foundation Models [55.919653720979824]
 This paper focuses on the concept of weak-to-strong generalization, which involves using a weaker model to supervise a stronger one.
We introduce a novel and adaptively adjustable loss function for weak-to-strong supervision.
Our approach not only exceeds the performance benchmarks set by strong-to-strong generalization but also surpasses the outcomes of fine-tuning strong models with whole datasets.
 arXiv  Detail & Related papers  (2024-02-06T06:30:34Z)
- Improving Weak-to-Strong Generalization with Scalable Oversight and
  Ensemble Learning [21.401598876308345]
 This paper presents a follow-up study to OpenAI's recent superalignment work on Weak-to-Strong Generalization (W2SG)
Superalignment focuses on ensuring that high-level AI systems remain consistent with human values and intentions when dealing with complex, high-risk tasks.
Our study simulates two phases of superalignment under the W2SG framework: the development of general superhuman models and the progression towards superintelligence.
 arXiv  Detail & Related papers  (2024-02-01T15:30:19Z)
- Hierarchical Decomposition of Prompt-Based Continual Learning:
  Rethinking Obscured Sub-optimality [55.88910947643436]
 Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
 arXiv  Detail & Related papers  (2023-10-11T06:51:46Z)
- Contrastive Knowledge Amalgamation for Unsupervised Image Classification [2.6392087010521728]
 Contrastive Knowledge Amalgamation (CKA) aims to learn a compact student model to handle the joint objective from multiple teacher models.
Contrastive losses intra- and inter- models are designed to widen the distance between representations of different classes.
The alignment loss is introduced to minimize the sample-level distribution differences of teacher-student models in the common representation space.
 arXiv  Detail & Related papers  (2023-07-27T11:21:14Z)
- Weakly-supervised HOI Detection via Prior-guided Bi-level Representation
  Learning [66.00600682711995]
 Human object interaction (HOI) detection plays a crucial role in human-centric scene understanding and serves as a fundamental building-block for many vision tasks.
One generalizable and scalable strategy for HOI detection is to use weak supervision, learning from image-level annotations only.
This is inherently challenging due to ambiguous human-object associations, large search space of detecting HOIs and highly noisy training signal.
We develop a CLIP-guided HOI representation capable of incorporating the prior knowledge at both image level and HOI instance level, and adopt a self-taught mechanism to prune incorrect human-object associations.
 arXiv  Detail & Related papers  (2023-03-02T14:41:31Z)
- From Mimicking to Integrating: Knowledge Integration for Pre-Trained
  Language Models [55.137869702763375]
 This paper explores a novel PLM reuse paradigm, Knowledge Integration (KI)
KI aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model.
We then design a Model Uncertainty--aware Knowledge Integration (MUKI) framework to recover the golden supervision for the student.
 arXiv  Detail & Related papers  (2022-10-11T07:59:08Z)
- Unsupervised Learning of Debiased Representations with Pseudo-Attributes [85.5691102676175]
 We propose a simple but effective debiasing technique in an unsupervised manner.
We perform clustering on the feature embedding space and identify pseudoattributes by taking advantage of the clustering results.
We then employ a novel cluster-based reweighting scheme for learning debiased representation.
 arXiv  Detail & Related papers  (2021-08-06T05:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.