Do You Trust Your Model? Emerging Malware Threats in the Deep Learning
Ecosystem
- URL: http://arxiv.org/abs/2403.03593v1
- Date: Wed, 6 Mar 2024 10:27:08 GMT
- Title: Do You Trust Your Model? Emerging Malware Threats in the Deep Learning
Ecosystem
- Authors: Dorjan Hitaj, Giulio Pagnotta, Fabio De Gaspari, Sediola Ruko, Briland
Hitaj, Luigi V. Mancini, Fernando Perez-Cruz
- Abstract summary: We introduce MaleficNet 2.0, a technique to embed self-extracting, self-executing malware in neural networks.
MaleficNet 2.0 injection technique is stealthy, does not degrade the performance of the model, and is robust against removal techniques.
We implement a proof-of-concept self-extracting neural network malware using MaleficNet 2.0, demonstrating the practicality of the attack against a widely adopted machine learning framework.
- Score: 37.650342256199096
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Training high-quality deep learning models is a challenging task due to
computational and technical requirements. A growing number of individuals,
institutions, and companies increasingly rely on pre-trained, third-party
models made available in public repositories. These models are often used
directly or integrated in product pipelines with no particular precautions,
since they are effectively just data in tensor form and considered safe. In
this paper, we raise awareness of a new machine learning supply chain threat
targeting neural networks. We introduce MaleficNet 2.0, a novel technique to
embed self-extracting, self-executing malware in neural networks. MaleficNet
2.0 uses spread-spectrum channel coding combined with error correction
techniques to inject malicious payloads in the parameters of deep neural
networks. MaleficNet 2.0 injection technique is stealthy, does not degrade the
performance of the model, and is robust against removal techniques. We design
our approach to work both in traditional and distributed learning settings such
as Federated Learning, and demonstrate that it is effective even when a reduced
number of bits is used for the model parameters. Finally, we implement a
proof-of-concept self-extracting neural network malware using MaleficNet 2.0,
demonstrating the practicality of the attack against a widely adopted machine
learning framework. Our aim with this work is to raise awareness against these
new, dangerous attacks both in the research community and industry, and we hope
to encourage further research in mitigation techniques against such threats.
Related papers
- Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - Zero Day Threat Detection Using Metric Learning Autoencoders [3.1965908200266173]
The proliferation of zero-day threats (ZDTs) to companies' networks has been immensely costly.
Deep learning methods are an attractive option for their ability to capture highly-nonlinear behavior patterns.
The models presented here are also trained and evaluated with two more datasets, and continue to show promising results even when generalizing to new network topologies.
arXiv Detail & Related papers (2022-11-01T13:12:20Z) - Adversarial Robustness Assessment of NeuroEvolution Approaches [1.237556184089774]
We evaluate the robustness of models found by two NeuroEvolution approaches on the CIFAR-10 image classification task.
Our results show that when the evolved models are attacked with iterative methods, their accuracy usually drops to, or close to, zero.
Some of these techniques can exacerbate the perturbations added to the original inputs, potentially harming robustness.
arXiv Detail & Related papers (2022-07-12T10:40:19Z) - EvilModel 2.0: Hiding Malware Inside of Neural Network Models [7.060465882091837]
Turning neural network models into stegomalware is a malicious use of AI.
Existing methods have a low malware embedding rate and a high impact on the model performance.
This paper proposes new methods to embed malware in models with high capacity and no service quality degradation.
arXiv Detail & Related papers (2021-09-09T15:31:33Z) - Mal2GCN: A Robust Malware Detection Approach Using Deep Graph
Convolutional Networks With Non-Negative Weights [1.3190581566723918]
We present a black-box source code-based adversarial malware generation approach that can be used to evaluate the robustness of malware detection models against real-world adversaries.
We then propose Mal2GCN, a robust malware detection model. Mal2GCN uses the representation power of graph convolutional networks combined with the non-negative weights training method to create a malware detection model with high detection accuracy.
arXiv Detail & Related papers (2021-08-27T19:42:13Z) - A Generative Model based Adversarial Security of Deep Learning and
Linear Classifier Models [0.0]
We have proposed a mitigation method for adversarial attacks against machine learning models with an autoencoder model.
The main idea behind adversarial attacks against machine learning models is to produce erroneous results by manipulating trained models.
We have also presented the performance of autoencoder models to various attack methods from deep neural networks to traditional algorithms.
arXiv Detail & Related papers (2020-10-17T17:18:17Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - HYDRA: Pruning Adversarially Robust Neural Networks [58.061681100058316]
Deep learning faces two key challenges: lack of robustness against adversarial attacks and large neural network size.
We propose to make pruning techniques aware of the robust training objective and let the training objective guide the search for which connections to prune.
We demonstrate that our approach, titled HYDRA, achieves compressed networks with state-of-the-art benign and robust accuracy, simultaneously.
arXiv Detail & Related papers (2020-02-24T19:54:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.