Related papers: Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models

Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models

URL: http://arxiv.org/abs/2403.16479v2
Date: Sun, 31 Mar 2024 12:36:04 GMT
Title: Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models
Authors: Mingyi Zhou, Xiang Gao, Pei Liu, John Grundy, Chunyang Chen, Xiao Chen, Li Li,
Abstract summary: deployed deep learning (DL) models can be easily extracted from real-world applications and devices by attackers. Traditional software protection techniques have been widely explored, if on-device models can be implemented using pure code, such as C++, it will open the possibility of reusing existing software protection techniques. We propose a novel method, CustomDLCoder, to automatically extract the on-device model information and synthesize a customized executable program.
Score: 29.635329143403368
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies show that deployed deep learning (DL) models such as those of Tensor Flow Lite (TFLite) can be easily extracted from real-world applications and devices by attackers to generate many kinds of attacks like adversarial attacks. Although securing deployed on-device DL models has gained increasing attention, no existing methods can fully prevent the aforementioned threats. Traditional software protection techniques have been widely explored, if on-device models can be implemented using pure code, such as C++, it will open the possibility of reusing existing software protection techniques. However, due to the complexity of DL models, there is no automatic method that can translate the DL models to pure code. To fill this gap, we propose a novel method, CustomDLCoder, to automatically extract the on-device model information and synthesize a customized executable program for a wide range of DL models. CustomDLCoder first parses the DL model, extracts its backend computing units, configures the computing units to a graph, and then generates customized code to implement and deploy the ML solution without explicit model representation. The synthesized program hides model information for DL deployment environments since it does not need to retain explicit model representation, preventing many attacks on the DL model. In addition, it improves ML performance because the customized code removes model parsing and preprocessing steps and only retains the data computing process. Our experimental results show that CustomDLCoder improves model security by disabling on-device model sniffing. Compared with the original on-device platform (i.e., TFLite), our method can accelerate model inference by 21.8% and 24.3% on x86-64 and ARM64 platforms, respectively. Most importantly, it can significantly reduce memory consumption by 68.8% and 36.0% on x86-64 and ARM64 platforms, respectively.

Related papers

DynaMO: Protecting Mobile DL Models through Coupling Obfuscated DL Operators [29.82616462226066]
Attackers can easily reverse-engineer mobile DL models in Apps to steal intellectual property or generate effective attacks. Model Obfuscation has been proposed to defend against such reverse engineering. We propose DynaMO, a Dynamic Model Obfuscation strategy similar to Homomorphic Encryption.
arXiv Detail & Related papers (2024-10-19T08:30:08Z)
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging [25.327483618051378]
We conduct the first study on the robustness of IP protection methods under model merging scenarios. Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models. Our research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques.
arXiv Detail & Related papers (2024-04-08T04:30:33Z)
Scalable Extraction of Training Data from (Production) Language Models [93.7746567808049]
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT.
arXiv Detail & Related papers (2023-11-28T18:47:03Z)
Watermarking LLMs with Weight Quantization [61.63899115699713]
This paper proposes a novel watermarking strategy that plants watermarks in the quantization process of large language models. We successfully plant the watermark into open-source large language model weights including GPT-Neo and LLaMA.
arXiv Detail & Related papers (2023-10-17T13:06:59Z)
MatFormer: Nested Transformer for Elastic Inference [94.1789252941718]
MatFormer is a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints. We show that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B. We also observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
arXiv Detail & Related papers (2023-10-11T17:57:14Z)
ModelObfuscator: Obfuscating Model Information to Protect Deployed ML-based Systems [31.988501084337678]
We develop a prototype tool ModelObfuscator to automatically obfuscate on-device TFLite models. Our experiments show that this proposed approach can dramatically improve model security.
arXiv Detail & Related papers (2023-06-01T05:24:00Z)
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models [47.99478573698432]
We consider methods to reduce the model size of Conformer-based speech recognition models. Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors.
arXiv Detail & Related papers (2023-03-15T03:21:38Z)
Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications. On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation. Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
An Empirical Study of Challenges in Converting Deep Learning Models [15.521925194920893]
We conduct the first empirical study to assess ONNX and CoreML for converting trained Deep Learning models. Our results reveal that the prediction accuracy of converted models are at the same level of originals. Converted models are generally assessed as robust at the same level of originals.
arXiv Detail & Related papers (2022-06-28T23:18:37Z)
Towards Training Reproducible Deep Learning Models [26.547756923322126]
Deep Learning (DL) models are challenging to be reproduced due to issues like randomness in the software and non-determinism in the hardware. This paper proposes a systematic approach to training reproducible DL models. Case study results show our approach can successfully reproduce six open source and one commercial DL models.
arXiv Detail & Related papers (2022-02-04T18:14:39Z)
Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.