Related papers: Distilling Adversarial Robustness Using Heterogeneous Teachers

Distilling Adversarial Robustness Using Heterogeneous Teachers

URL: http://arxiv.org/abs/2402.15586v1
Date: Fri, 23 Feb 2024 19:55:13 GMT
Title: Distilling Adversarial Robustness Using Heterogeneous Teachers
Authors: Jieren Deng, Aaron Palmer, Rigel Mahmood, Ethan Rathbun, Jinbo Bi, Kaleel Mahmood and Derek Aguiar
Abstract summary: robustness can be transferred from an adversarially trained teacher to a student model using knowledge distillation. We develop a defense framework against adversarial attacks by distilling robustness using heterogeneous teachers. Experiments on classification tasks in both white-box and black-box scenarios demonstrate that DARHT achieves state-of-the-art clean and robust accuracies.
Score: 9.404102810698202
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Achieving resiliency against adversarial attacks is necessary prior to deploying neural network classifiers in domains where misclassification incurs substantial costs, e.g., self-driving cars or medical imaging. Recent work has demonstrated that robustness can be transferred from an adversarially trained teacher to a student model using knowledge distillation. However, current methods perform distillation using a single adversarial and vanilla teacher and consider homogeneous architectures (i.e., residual networks) that are susceptible to misclassify examples from similar adversarial subspaces. In this work, we develop a defense framework against adversarial attacks by distilling adversarial robustness using heterogeneous teachers (DARHT). In DARHT, the student model explicitly represents teacher logits in a student-teacher feature map and leverages multiple teachers that exhibit low adversarial example transferability (i.e., exhibit high performance on dissimilar adversarial examples). Experiments on classification tasks in both white-box and black-box scenarios demonstrate that DARHT achieves state-of-the-art clean and robust accuracies when compared to competing adversarial training and distillation methods in the CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. Comparisons with homogeneous and heterogeneous teacher sets suggest that leveraging teachers with low adversarial example transferability increases student model robustness.

Related papers

Group Relative Knowledge Distillation: Learning from Teacher's Relational Inductive Bias [5.434571018755813]
Group Relative Knowledge Distillation (GRKD) is a novel framework that distills teacher knowledge by learning the relative ranking among classes. Experiments on classification benchmarks demonstrate GRKD achieves superior generalization compared to existing methods.
arXiv Detail & Related papers (2025-04-29T07:23:22Z)
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
DCD: Discriminative and Consistent Representation Distillation [6.24302896438145]
We propose Discriminative and Consistent Distillation (DCD) to transfer knowledge from a large teacher model to a smaller student model. DCD employs a contrastive loss along with a consistency regularization to minimize the discrepancy between the distributions of teacher and student representations. We show that DCD achieves state-of-the-art performance, with the student model sometimes surpassing the teacher's accuracy.
arXiv Detail & Related papers (2024-07-16T14:53:35Z)
Common Knowledge Learning for Generating Transferable Adversarial Examples [60.1287733223249]
This paper focuses on an important type of black-box attacks, where the adversary generates adversarial examples by a substitute (source) model. Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures. We propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples.
arXiv Detail & Related papers (2023-07-01T09:07:12Z)
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners [102.20090188997301]
We explore how to obtain a model that combines Contrastive Learning (CL) and Masked Image Modeling (MIM) strengths. In order to better obtain both discrimination and diversity, we propose a simple but effective Hybrid Distillation strategy. Experiment results prove that Hybrid Distill can achieve superior performance on different benchmarks.
arXiv Detail & Related papers (2023-06-28T02:19:35Z)
Knowledge Distillation from A Stronger Teacher [44.11781464210916]
This paper presents a method dubbed DIST to distill better from a stronger teacher. We empirically find that the discrepancy of predictions between the student and a stronger teacher may tend to be fairly severer. Our method is simple yet practical, and extensive experiments demonstrate that it adapts well to various architectures.
arXiv Detail & Related papers (2022-05-21T08:30:58Z)
Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
Knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks. Knowledge distillation extracts knowledge from the teacher and integrates it with the target model. Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space.
arXiv Detail & Related papers (2022-05-04T06:49:47Z)
Robustness through Cognitive Dissociation Mitigation in Contrastive Adversarial Training [2.538209532048867]
We introduce a novel neural network training framework that increases model's adversarial robustness to adversarial attacks. We propose to improve model robustness to adversarial attacks by learning feature representations consistent under both data augmentations and adversarial perturbations. We validate our method on the CIFAR-10 dataset on which it outperforms both robust accuracy and clean accuracy over alternative supervised and self-supervised adversarial learning methods.
arXiv Detail & Related papers (2022-03-16T21:41:27Z)
Teacher's pet: understanding and mitigating biases in distillation [61.44867470297283]
Several works have shown that distillation significantly boosts the student's overall performance. However, are these gains uniform across all data subgroups? We show that distillation can harm performance on certain subgroups. We present techniques which soften the teacher influence for subgroups where it is less reliable.
arXiv Detail & Related papers (2021-06-19T13:06:25Z)
Understanding Robustness in Teacher-Student Setting: A New Perspective [42.746182547068265]
Adrial examples are machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions. Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness. Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.
arXiv Detail & Related papers (2021-02-25T20:54:24Z)
Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks. We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models. We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z)
Feature Distillation With Guided Adversarial Contrastive Learning [41.28710294669751]
We propose Guided Adversarial Contrastive Distillation (GACD) to transfer adversarial robustness from teacher to student with features. With a well-trained teacher model as an anchor, students are expected to extract features similar to the teacher. With GACD, the student not only learns to extract robust features, but also captures structural knowledge from the teacher.
arXiv Detail & Related papers (2020-09-21T14:46:17Z)
Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions. We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.