Related papers: Investigating White-Box Attacks for On-Device Models

Investigating White-Box Attacks for On-Device Models

URL: http://arxiv.org/abs/2402.05493v4
Date: Fri, 1 Mar 2024 05:22:38 GMT
Title: Investigating White-Box Attacks for On-Device Models
Authors: Mingyi Zhou, Xiang Gao, Jing Wu, Kui Liu, Hailong Sun, Li Li
Abstract summary: On-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. We propose a Reverse Engineering framework for On-device Models (REOM), which automatically reverses the compiled on-device TFLite model to the debuggable model. Our results show that REOM enables attackers to achieve higher attack success rates with a hundred times smaller attack perturbations.
Score: 21.329209501209665
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerous mobile apps have leveraged deep learning capabilities. However, on-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. Existing on-device attacking approaches only generate black-box attacks, which are far less effective and efficient than white-box strategies. This is because mobile deep learning frameworks like TFLite do not support gradient computing, which is necessary for white-box attacking algorithms. Thus, we argue that existing findings may underestimate the harmfulness of on-device attacks. To this end, we conduct a study to answer this research question: Can on-device models be directly attacked via white-box strategies? We first systematically analyze the difficulties of transforming the on-device model to its debuggable version, and propose a Reverse Engineering framework for On-device Models (REOM), which automatically reverses the compiled on-device TFLite model to the debuggable model. Specifically, REOM first transforms compiled on-device models into Open Neural Network Exchange format, then removes the non-debuggable parts, and converts them to the debuggable DL models format that allows attackers to exploit in a white-box setting. Our experimental results show that our approach is effective in achieving automated transformation among 244 TFLite models. Compared with previous attacks using surrogate models, REOM enables attackers to achieve higher attack success rates with a hundred times smaller attack perturbations. In addition, because the ONNX platform has plenty of tools for model format exchanging, the proposed method based on the ONNX platform can be adapted to other model formats. Our findings emphasize the need for developers to carefully consider their model deployment strategies, and use white-box methods to evaluate the vulnerability of on-device models.

Related papers

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage [71.8564105095189]
We introduce N-Gram Coverage Attack, a membership inference attack that relies solely on text outputs from the target model.<n>We first demonstrate on a diverse set of existing benchmarks that N-Gram Coverage Attack outperforms other black-box methods.<n>We find that more recent models, such as GPT-4o, exhibit increased robustness to membership inference.
arXiv Detail & Related papers (2025-08-13T08:35:16Z)
HoneypotNet: Backdoor Attacks Against Model Extraction [24.603590328055027]
Model extraction attacks pose severe security threats to production models and ML platforms. We introduce a new defense paradigm called attack as defense which modifies the model's output to be poisonous. HoneypotNet can inject backdoors into substitute models with a high success rate.
arXiv Detail & Related papers (2025-01-02T06:23:51Z)
A Realistic Threat Model for Large Language Model Jailbreaks [87.64278063236847]
In this work, we propose a unified threat model for the principled comparison of jailbreak attacks. Our threat model combines constraints in perplexity, measuring how far a jailbreak deviates from natural text. We adapt popular attacks to this new, realistic threat model, with which we, for the first time, benchmark these attacks on equal footing.
arXiv Detail & Related papers (2024-10-21T17:27:01Z)
DynaMO: Protecting Mobile DL Models through Coupling Obfuscated DL Operators [29.82616462226066]
Attackers can easily reverse-engineer mobile DL models in Apps to steal intellectual property or generate effective attacks. Model Obfuscation has been proposed to defend against such reverse engineering. We propose DynaMO, a Dynamic Model Obfuscation strategy similar to Homomorphic Encryption.
arXiv Detail & Related papers (2024-10-19T08:30:08Z)
Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems. GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model. Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z)
ModelObfuscator: Obfuscating Model Information to Protect Deployed ML-based Systems [31.988501084337678]
We develop a prototype tool ModelObfuscator to automatically obfuscate on-device TFLite models. Our experiments show that this proposed approach can dramatically improve model security.
arXiv Detail & Related papers (2023-06-01T05:24:00Z)
Query Efficient Cross-Dataset Transferable Black-Box Attack on Action Recognition [99.29804193431823]
Black-box adversarial attacks present a realistic threat to action recognition systems. We propose a new attack on action recognition that addresses these shortcomings by generating perturbations. Our method achieves 8% and higher 12% deception rates compared to state-of-the-art query-based and transfer-based attacks.
arXiv Detail & Related papers (2022-11-23T17:47:49Z)
Scoring Black-Box Models for Adversarial Robustness [4.416484585765028]
robustness of models to adversarial attacks has been analyzed. We propose a simple scoring method for black-box models which indicates their robustness to adversarial input.
arXiv Detail & Related papers (2022-10-31T08:41:44Z)
Smart App Attack: Hacking Deep Learning Models in Android Apps [16.663345577900813]
We introduce a grey-box adversarial attack framework to hack on-device models. We evaluate the attack effectiveness and generality in terms of four different settings. Among 53 apps adopting transfer learning, we find that 71.7% of them can be successfully attacked.
arXiv Detail & Related papers (2022-04-23T14:01:59Z)
How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective [74.47093382436823]
We address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback? We propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS) We empirically show that ZO-AE-DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines.
arXiv Detail & Related papers (2022-03-27T03:23:32Z)
Query Efficient Decision Based Sparse Attacks Against Black-Box Deep Learning Models [9.93052896330371]
We develop an evolution-based algorithm-SparseEvo-for the problem and evaluate against both convolutional deep neural networks and vision transformers. SparseEvo requires significantly fewer model queries than the state-of-the-art sparse attack Pointwise for both untargeted and targeted attacks. Importantly, the query efficient SparseEvo, along with decision-based attacks, in general raise new questions regarding the safety of deployed systems.
arXiv Detail & Related papers (2022-01-31T21:10:47Z)
Meta Gradient Adversarial Attack [64.5070788261061]
This paper proposes a novel architecture called Metaversa Gradient Adrial Attack (MGAA), which is plug-and-play and can be integrated with any existing gradient-based attack method. Specifically, we randomly sample multiple models from a model zoo to compose different tasks and iteratively simulate a white-box attack and a black-box attack in each task. By narrowing the gap between the gradient directions in white-box and black-box attacks, the transferability of adversarial examples on the black-box setting can be improved.
arXiv Detail & Related papers (2021-08-09T17:44:19Z)
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs) We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases. Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.