Distilling Out-of-Distribution Robustness from Vision-Language
Foundation Models
- URL: http://arxiv.org/abs/2311.01441v2
- Date: Mon, 5 Feb 2024 00:10:44 GMT
- Title: Distilling Out-of-Distribution Robustness from Vision-Language
Foundation Models
- Authors: Andy Zhou and Jindong Wang and Yu-Xiong Wang and Haohan Wang
- Abstract summary: We propose a conceptually simple and lightweight framework for improving the robustness of vision models.
We show strong gains in out-of-distribution robustness when distilling from pretrained foundation models.
We provide a theoretical framework for the use of a robust teacher in the knowledge distillation with data augmentation setting.
- Score: 40.885755686727855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a conceptually simple and lightweight framework for improving the
robustness of vision models through the combination of knowledge distillation
and data augmentation. We address the conjecture that larger models do not make
for better teachers by showing strong gains in out-of-distribution robustness
when distilling from pretrained foundation models. Following this finding, we
propose Discrete Adversarial Distillation (DAD), which leverages a robust
teacher to generate adversarial examples and a VQGAN to discretize them,
creating more informative samples than standard data augmentation techniques.
We provide a theoretical framework for the use of a robust teacher in the
knowledge distillation with data augmentation setting and demonstrate strong
gains in out-of-distribution robustness and clean accuracy across different
student architectures. Notably, our method adds minor computational overhead
compared to similar techniques and can be easily combined with other data
augmentations for further improvements.
Related papers
- Faithful Label-free Knowledge Distillation [8.572967695281054]
This paper presents a label-free knowledge distillation approach called Teacher in the Middle (TinTeM)
It produces a more faithful student, which better replicates the behavior of the teacher network across a range of benchmarks testing model robustness, generalisability and out-of-distribution detection.
arXiv Detail & Related papers (2024-11-22T01:48:44Z) - Distillation from Heterogeneous Models for Top-K Recommendation [43.83625440616829]
HetComp is a framework that guides the student model by transferring sequences of knowledge from teachers' trajectories.
HetComp significantly improves the distillation quality and the generalization of the student model.
arXiv Detail & Related papers (2023-03-02T10:23:50Z) - Self-Supervised Monocular Depth Estimation with Self-Reference
Distillation and Disparity Offset Refinement [15.012694052674899]
We propose two novel ideas to improve self-supervised monocular depth estimation.
We use a parameter-optimized model as the teacher updated as the training epochs to provide additional supervision.
We leverage the contextual consistency between high-scale and low-scale features to obtain multiscale disparity offsets.
arXiv Detail & Related papers (2023-02-20T06:28:52Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.