Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
- URL: http://arxiv.org/abs/2512.10275v1
- Date: Thu, 11 Dec 2025 04:31:04 GMT
- Title: Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
- Authors: Hongsin Lee, Hye Won Chung,
- Abstract summary: Existing work often neglects to incorporate state-of-the-art robust teachers.<n>We identify adversarial transferability-the fraction of student-crafted adversarial examples that remain effective against the teacher.<n>We propose Sample-wise Adaptive Adversarial Distillation (SAAD), which reweights training examples by their measured transferability without incurring additional computational cost.
- Score: 22.989324947501018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial distillation in the standard min-max adversarial training framework aims to transfer adversarial robustness from a large, robust teacher network to a compact student. However, existing work often neglects to incorporate state-of-the-art robust teachers. Through extensive analysis, we find that stronger teachers do not necessarily yield more robust students-a phenomenon known as robust saturation. While typically attributed to capacity gaps, we show that such explanations are incomplete. Instead, we identify adversarial transferability-the fraction of student-crafted adversarial examples that remain effective against the teacher-as a key factor in successful robustness transfer. Based on this insight, we propose Sample-wise Adaptive Adversarial Distillation (SAAD), which reweights training examples by their measured transferability without incurring additional computational cost. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that SAAD consistently improves AutoAttack robustness over prior methods. Our code is available at https://github.com/HongsinLee/saad.
Related papers
- CIARD: Cyclic Iterative Adversarial Robustness Distillation [19.685981220232712]
Adrial robustness distillation (ARD) aims to transfer performance and robustness from teacher model to student model.<n>Existing ARD approaches enhance student model's robustness, but the inevitable by-product leads to degraded performance on clean examples.<n>We propose a novel Cyclic Iterative ARD (CIARD) method with two key innovations.
arXiv Detail & Related papers (2025-09-16T03:51:43Z) - DARD: Dice Adversarial Robustness Distillation against Adversarial Attacks [12.90150211072263]
We introduce Dice Adversarial Robustness Distillation (DARD), a novel method designed to transfer robustness through a tailored knowledge distillation paradigm.<n>Our experiments demonstrate that the DARD approach consistently outperforms adversarially trained networks with the same architecture.
arXiv Detail & Related papers (2025-09-15T02:31:30Z) - Distilling Adversarial Robustness Using Heterogeneous Teachers [9.404102810698202]
robustness can be transferred from an adversarially trained teacher to a student model using knowledge distillation.
We develop a defense framework against adversarial attacks by distilling robustness using heterogeneous teachers.
Experiments on classification tasks in both white-box and black-box scenarios demonstrate that DARHT achieves state-of-the-art clean and robust accuracies.
arXiv Detail & Related papers (2024-02-23T19:55:13Z) - Mitigating Accuracy-Robustness Trade-off via Balanced Multi-Teacher Adversarial Distillation [12.39860047886679]
Adversarial Training is a practical approach for improving the robustness of deep neural networks against adversarial attacks.
We introduce Balanced Multi-Teacher Adversarial Robustness Distillation (B-MTARD) to guide the model's Adversarial Training process.
B-MTARD outperforms the state-of-the-art methods against various adversarial attacks.
arXiv Detail & Related papers (2023-06-28T12:47:01Z) - Adversarial Contrastive Distillation with Adaptive Denoising [15.119013995045192]
We propose Contrastive Relationship DeNoise Distillation (CRDND) to boost the robustness of small models.
We show CRDND can transfer robust knowledge efficiently and achieves state-of-the-art performances.
arXiv Detail & Related papers (2023-02-17T09:00:18Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - When Does Contrastive Learning Preserve Adversarial Robustness from
Pretraining to Finetuning? [99.4914671654374]
We propose AdvCL, a novel adversarial contrastive pretraining framework.
We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency.
arXiv Detail & Related papers (2021-11-01T17:59:43Z) - Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make
Student Better [66.69777970159558]
We propose a novel adversarial robustness distillation method called Robust Soft Label Adversarial Distillation (RSLAD)
RSLAD fully exploits the robust soft labels produced by a robust (adversarially-trained) large teacher model to guide the student's learning.
We empirically demonstrate the effectiveness of our RSLAD approach over existing adversarial training and distillation methods.
arXiv Detail & Related papers (2021-08-18T04:32:35Z) - Analysis and Applications of Class-wise Robustness in Adversarial
Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.
Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing.
We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet.
We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Feature Distillation With Guided Adversarial Contrastive Learning [41.28710294669751]
We propose Guided Adversarial Contrastive Distillation (GACD) to transfer adversarial robustness from teacher to student with features.
With a well-trained teacher model as an anchor, students are expected to extract features similar to the teacher.
With GACD, the student not only learns to extract robust features, but also captures structural knowledge from the teacher.
arXiv Detail & Related papers (2020-09-21T14:46:17Z) - Adversarial Robustness on In- and Out-Distribution Improves
Explainability [109.68938066821246]
RATIO is a training procedure for robustness via Adversarial Training on In- and Out-distribution.
RATIO achieves state-of-the-art $l$-adrial on CIFAR10 and maintains better clean accuracy.
arXiv Detail & Related papers (2020-03-20T18:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.