Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation
- URL: http://arxiv.org/abs/2507.21992v1
- Date: Tue, 29 Jul 2025 16:43:54 GMT
- Title: Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation
- Authors: Siddhartha Pradhan, Shikshya Shiwakoti, Neha Bathuri,
- Abstract summary: knowledge distillation can enhance the generation of transferable adversarial examples.<n>A lightweight student model is trained using two KD strategies: curriculum-based switching and joint optimization.<n>Student models distilled from multiple teachers achieve attack success rates comparable to ensemble-based baselines.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate whether knowledge distillation (KD) from multiple heterogeneous teacher models can enhance the generation of transferable adversarial examples. A lightweight student model is trained using two KD strategies: curriculum-based switching and joint optimization, with ResNet50 and DenseNet-161 as teachers. The trained student is then used to generate adversarial examples using FG, FGS, and PGD attacks, which are evaluated against a black-box target model (GoogLeNet). Our results show that student models distilled from multiple teachers achieve attack success rates comparable to ensemble-based baselines, while reducing adversarial example generation time by up to a factor of six. An ablation study further reveals that lower temperature settings and the inclusion of hard-label supervision significantly enhance transferability. These findings suggest that KD can serve not only as a model compression technique but also as a powerful tool for improving the efficiency and effectiveness of black-box adversarial attacks.
Related papers
- Improving Adversarial Robustness Through Adaptive Learning-Driven Multi-Teacher Knowledge Distillation [1.8609604872307923]
Convolutional neural networks (CNNs) excel in computer vision but are susceptible to adversarial attacks.<n>Despite advances in adversarial training, a gap persists between model accuracy and robustness.<n>We present a multi-teacher adversarial robustness distillation using an adaptive learning strategy.
arXiv Detail & Related papers (2025-07-28T17:08:40Z) - Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.<n>In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.<n>We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Relational Representation Distillation [6.24302896438145]
Knowledge distillation involves transferring knowledge from large, cumbersome teacher models to more compact student models.<n>Standard approaches fail to capture important structural relationships in the teacher's internal representations.<n>Recent advances have turned to contrastive learning objectives, but these methods impose overly strict constraints through instance-discrimination.<n>Our method employs separate temperature parameters for teacher and student distributions, with sharper student outputs, enabling precise learning of primary relationships while preserving secondary similarities.
arXiv Detail & Related papers (2024-07-16T14:56:13Z) - Relative Difficulty Distillation for Semantic Segmentation [54.76143187709987]
We propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD)
RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals.
Our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound.
arXiv Detail & Related papers (2024-07-04T08:08:25Z) - Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples [2.0257616108612373]
Adversarial Sparse Teacher (AST) is a robust defense method against distillation-based model stealing attacks.
Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution.
arXiv Detail & Related papers (2024-03-08T09:43:27Z) - Distilling Adversarial Robustness Using Heterogeneous Teachers [9.404102810698202]
robustness can be transferred from an adversarially trained teacher to a student model using knowledge distillation.
We develop a defense framework against adversarial attacks by distilling robustness using heterogeneous teachers.
Experiments on classification tasks in both white-box and black-box scenarios demonstrate that DARHT achieves state-of-the-art clean and robust accuracies.
arXiv Detail & Related papers (2024-02-23T19:55:13Z) - Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning [3.1423836318272773]
Knowledge distillation (KD) improves the performance of efficient and lightweight models.<n>Most existing KD techniques rely on Kullback-Leibler (KL) divergence.<n>We propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning.
arXiv Detail & Related papers (2023-11-23T11:34:48Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD)
We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy.
Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
arXiv Detail & Related papers (2021-10-22T21:30:53Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Feature Distillation With Guided Adversarial Contrastive Learning [41.28710294669751]
We propose Guided Adversarial Contrastive Distillation (GACD) to transfer adversarial robustness from teacher to student with features.
With a well-trained teacher model as an anchor, students are expected to extract features similar to the teacher.
With GACD, the student not only learns to extract robust features, but also captures structural knowledge from the teacher.
arXiv Detail & Related papers (2020-09-21T14:46:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.