Related papers: Understanding Robustness in Teacher-Student Setting: A New Perspective

Understanding Robustness in Teacher-Student Setting: A New Perspective

URL: http://arxiv.org/abs/2102.13170v2
Date: Mon, 1 Mar 2021 03:49:19 GMT
Title: Understanding Robustness in Teacher-Student Setting: A New Perspective
Authors: Zhuolin Yang, Zhaoxi Chen, Tiffany Cai, Xinyun Chen, Bo Li, Yuandong Tian
Abstract summary: Adrial examples are machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions. Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness. Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.
Score: 42.746182547068265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial examples have appeared as a ubiquitous property of machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions. Such examples provide a way to assess the robustness of machine learning models as well as a proxy for understanding the model training process. Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness (e.g. adversarial training). While they mostly focus on models trained on datasets with predefined labels, we leverage the teacher-student framework and assume a teacher model, or oracle, to provide the labels for given instances. We extend Tian (2019) in the case of low-rank input data and show that student specialization (trained student neuron is highly correlated with certain teacher neuron at the same layer) still happens within the input subspace, but the teacher and student nodes could differ wildly out of the data subspace, which we conjecture leads to adversarial examples. Extensive experiments show that student specialization correlates strongly with model robustness in different scenarios, including student trained via standard training, adversarial training, confidence-calibrated adversarial training, and training with robust feature dataset. Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.

Related papers

UnLearning from Experience to Avoid Spurious Correlations [3.283369870504872]
We propose a new approach that addresses the issue of spurious correlations: UnLearning from Experience (ULE) Our method is based on using two classification models trained in parallel: student and teacher models. We show that our method is effective on the Waterbirds, CelebA, Spawrious and UrbanCars datasets.
arXiv Detail & Related papers (2024-09-04T15:06:44Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors. We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Adversarial Examples for Unsupervised Machine Learning Models [71.81480647638529]
Adrial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models. We propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.
arXiv Detail & Related papers (2021-03-02T17:47:58Z)
Quantifying and Mitigating Privacy Risks of Contrastive Learning [4.909548818641602]
We perform the first privacy analysis of contrastive learning through the lens of membership inference and attribute inference. Our results show that contrastive models are less vulnerable to membership inference attacks but more vulnerable to attribute inference attacks compared to supervised models. To remedy this situation, we propose the first privacy-preserving contrastive learning mechanism, namely Talos.
arXiv Detail & Related papers (2021-02-08T11:38:11Z)
FaceLeaks: Inference Attacks against Transfer Learning Models via Black-box Queries [2.7564955518050693]
We investigate if one can leak or infer private information without interacting with the teacher model directly. We propose novel strategies to infer from aggregate-level information. Our study indicates that information leakage is a real privacy threat to the transfer learning framework widely used in real-life situations.
arXiv Detail & Related papers (2020-10-27T03:02:40Z)
Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions. We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.