Stealing the Invisible: Unveiling Pre-Trained CNN Models through
Adversarial Examples and Timing Side-Channels
- URL: http://arxiv.org/abs/2402.11953v1
- Date: Mon, 19 Feb 2024 08:47:20 GMT
- Title: Stealing the Invisible: Unveiling Pre-Trained CNN Models through
Adversarial Examples and Timing Side-Channels
- Authors: Shubhi Shukla, Manaar Alam, Pabitra Mitra, Debdeep Mukhopadhyay
- Abstract summary: We present an approach based on the observation that the classification patterns of adversarial images can be used as a means to steal the models.
Our approach exploits varying misclassifications of adversarial images across different models to fingerprint several renowned Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures.
- Score: 14.222432788661914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning, with its myriad applications, has become an integral
component of numerous technological systems. A common practice in this domain
is the use of transfer learning, where a pre-trained model's architecture,
readily available to the public, is fine-tuned to suit specific tasks. As
Machine Learning as a Service (MLaaS) platforms increasingly use pre-trained
models in their backends, it's crucial to safeguard these architectures and
understand their vulnerabilities. In this work, we present an approach based on
the observation that the classification patterns of adversarial images can be
used as a means to steal the models. Furthermore, the adversarial image
classifications in conjunction with timing side channels can lead to a model
stealing method. Our approach, designed for typical user-level access in remote
MLaaS environments exploits varying misclassifications of adversarial images
across different models to fingerprint several renowned Convolutional Neural
Network (CNN) and Vision Transformer (ViT) architectures. We utilize the
profiling of remote model inference times to reduce the necessary adversarial
images, subsequently decreasing the number of queries required. We have
presented our results over 27 pre-trained models of different CNN and ViT
architectures using CIFAR-10 dataset and demonstrate a high accuracy of 88.8%
while keeping the query budget under 20.
Related papers
- Memory Backdoor Attacks on Neural Networks [3.2720947374803777]
We propose the memory backdoor attack, where a model is covertly trained to specific training samples and later selectively output them.
We demonstrate the attack on image classifiers, segmentation models, and a large language model (LLM)
arXiv Detail & Related papers (2024-11-21T16:09:16Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Identifying and Mitigating Model Failures through Few-shot CLIP-aided
Diffusion Generation [65.268245109828]
We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations.
These descriptions can be used to generate synthetic data using generative models, such as diffusion models.
Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
arXiv Detail & Related papers (2023-12-09T04:43:49Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.