Indirect Gradient Matching for Adversarial Robust Distillation
- URL: http://arxiv.org/abs/2312.03286v1
- Date: Wed, 6 Dec 2023 04:32:38 GMT
- Title: Indirect Gradient Matching for Adversarial Robust Distillation
- Authors: Hongsin Lee, Seungju Cho, Changick Kim
- Abstract summary: Adversarial training significantly improves adversarial robustness, but superior performance is primarily attained with large models.
Existing adversarial distillation methods leverage the teacher's logits as a guide.
We propose a distillation module that indirectly matches the student's input gradient with that of the teacher.
- Score: 17.06592851567578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training significantly improves adversarial robustness, but
superior performance is primarily attained with large models. This substantial
performance gap for smaller models has spurred active research into adversarial
distillation (AD) to mitigate the difference. Existing AD methods leverage the
teacher's logits as a guide. In contrast to these approaches, we aim to
transfer another piece of knowledge from the teacher, the input gradient. In
this paper, we propose a distillation module termed Indirect Gradient
Distillation Module (IGDM) that indirectly matches the student's input gradient
with that of the teacher. We hypothesize that students can better acquire the
teacher's knowledge by matching the input gradient. Leveraging the observation
that adversarial training renders the model locally linear on the input space,
we employ Taylor approximation to effectively align gradients without directly
calculating them. Experimental results show that IGDM seamlessly integrates
with existing AD methods, significantly enhancing the performance of all AD
methods. Particularly, utilizing IGDM on the CIFAR-100 dataset improves the
AutoAttack accuracy from 28.06% to 30.32% with the ResNet-18 model and from
26.18% to 29.52% with the MobileNetV2 model when integrated into the SOTA
method without additional data augmentation. The code will be made available.
Related papers
- Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution [81.81748032199813]
We propose a Distillation-Free One-Step Diffusion model.
Specifically, we propose a noise-aware discriminator (NAD) to participate in adversarial training.
We improve the perceptual loss with edge-aware DISTS (EA-DISTS) to enhance the model's ability to generate fine details.
arXiv Detail & Related papers (2024-10-05T16:41:36Z) - Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher [43.678380057638016]
Gap Preserving Distillation (GPD) method trains an additional dynamic teacher model from scratch along with training the student to bridge this gap.
In experiments, GPD significantly outperforms existing distillation methods on top of both CNNs and transformers architectures.
GPD also generalizes well to the scenarios without a pre-trained teacher, including training from scratch and fine-tuning, yielding a large improvement of 1.80% and 0.89% on ResNet18.
arXiv Detail & Related papers (2024-10-05T12:29:51Z) - iDAT: inverse Distillation Adapter-Tuning [15.485126287621439]
Adapter-Tuning (AT) method involves freezing a pre-trained model and introducing trainable adapter modules to acquire downstream knowledge.
This paper proposes a distillation framework for the AT method instead of crafting a carefully designed adapter module.
arXiv Detail & Related papers (2024-03-23T07:36:58Z) - One-for-All: Bridge the Gap Between Heterogeneous Architectures in
Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme.
Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family.
We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Prediction-Guided Distillation for Dense Object Detection [7.5320132424481505]
We show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher's high detection performance.
We propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher.
Our proposed approach outperforms current state-of-the-art KD baselines on a variety of advanced one-stage detection architectures.
arXiv Detail & Related papers (2022-03-10T16:46:05Z) - Adaptive Instance Distillation for Object Detection in Autonomous
Driving [3.236217153362305]
We propose Adaptive Instance Distillation (AID) to selectively impart teacher's knowledge to the student to improve the performance of knowledge distillation.
Our AID is also shown to be useful for self-distillation to improve the teacher model's performance.
arXiv Detail & Related papers (2022-01-26T18:06:33Z) - LGD: Label-guided Self-distillation for Object Detection [59.9972914042281]
We propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation)
Our framework involves sparse label-appearance encoding, inter-object relation adaptation and intra-object knowledge mapping to obtain the instructive knowledge.
Compared with a classical teacher-based method FGFI, LGD not only performs better without requiring pretrained teacher but also with 51% lower training cost beyond inherent student learning.
arXiv Detail & Related papers (2021-09-23T16:55:01Z) - Enhancing Data-Free Adversarial Distillation with Activation
Regularization and Virtual Interpolation [19.778192371420793]
A data-free adversarial distillation framework deploys a generative network to transfer the teacher model's knowledge to the student model.
We add an activation regularizer and a virtual adversarial method to improve the data generation efficiency.
Our model's accuracy is 13.8% higher than the state-of-the-art data-free method on CIFAR-100.
arXiv Detail & Related papers (2021-02-23T11:37:40Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.