Understanding Robustness in Teacher-Student Setting: A New Perspective
- URL: http://arxiv.org/abs/2102.13170v2
- Date: Mon, 1 Mar 2021 03:49:19 GMT
- Title: Understanding Robustness in Teacher-Student Setting: A New Perspective
- Authors: Zhuolin Yang, Zhaoxi Chen, Tiffany Cai, Xinyun Chen, Bo Li, Yuandong
Tian
- Abstract summary: Adrial examples are machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions.
Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness.
Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.
- Score: 42.746182547068265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples have appeared as a ubiquitous property of machine
learning models where bounded adversarial perturbation could mislead the models
to make arbitrarily incorrect predictions. Such examples provide a way to
assess the robustness of machine learning models as well as a proxy for
understanding the model training process. Extensive studies try to explain the
existence of adversarial examples and provide ways to improve model robustness
(e.g. adversarial training). While they mostly focus on models trained on
datasets with predefined labels, we leverage the teacher-student framework and
assume a teacher model, or oracle, to provide the labels for given instances.
We extend Tian (2019) in the case of low-rank input data and show that student
specialization (trained student neuron is highly correlated with certain
teacher neuron at the same layer) still happens within the input subspace, but
the teacher and student nodes could differ wildly out of the data subspace,
which we conjecture leads to adversarial examples. Extensive experiments show
that student specialization correlates strongly with model robustness in
different scenarios, including student trained via standard training,
adversarial training, confidence-calibrated adversarial training, and training
with robust feature dataset. Our studies could shed light on the future
exploration about adversarial examples, and enhancing model robustness via
principled data augmentation.
Related papers
- UnLearning from Experience to Avoid Spurious Correlations [3.283369870504872]
We propose a new approach that addresses the issue of spurious correlations: UnLearning from Experience (ULE)
Our method is based on using two classification models trained in parallel: student and teacher models.
We show that our method is effective on the Waterbirds, CelebA, Spawrious and UrbanCars datasets.
arXiv Detail & Related papers (2024-09-04T15:06:44Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Adversarial Examples for Unsupervised Machine Learning Models [71.81480647638529]
Adrial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models.
We propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.
arXiv Detail & Related papers (2021-03-02T17:47:58Z) - Quantifying and Mitigating Privacy Risks of Contrastive Learning [4.909548818641602]
We perform the first privacy analysis of contrastive learning through the lens of membership inference and attribute inference.
Our results show that contrastive models are less vulnerable to membership inference attacks but more vulnerable to attribute inference attacks compared to supervised models.
To remedy this situation, we propose the first privacy-preserving contrastive learning mechanism, namely Talos.
arXiv Detail & Related papers (2021-02-08T11:38:11Z) - FaceLeaks: Inference Attacks against Transfer Learning Models via
Black-box Queries [2.7564955518050693]
We investigate if one can leak or infer private information without interacting with the teacher model directly.
We propose novel strategies to infer from aggregate-level information.
Our study indicates that information leakage is a real privacy threat to the transfer learning framework widely used in real-life situations.
arXiv Detail & Related papers (2020-10-27T03:02:40Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.