Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices
- URL: http://arxiv.org/abs/2403.12568v1
- Date: Tue, 19 Mar 2024 09:22:50 GMT
- Title: Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices
- Authors: Xueshuo Xie, Haoxu Wang, Zhaolong Jian, Tao Li, Wei Wang, Zhiwei Xu, Guiling Wang,
- Abstract summary: Edge intelligence enables resource-demanding Deep Neural Network (DNN) inference without transferring original data.
For privacy-sensitive applications, deploying models in hardware-isolated trusted execution environments (TEEs) becomes essential.
We present a novel approach for advanced model deployment in TrustZone that ensures comprehensive privacy preservation during model inference.
- Score: 9.928745904761358
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Edge intelligence enables resource-demanding Deep Neural Network (DNN) inference without transferring original data, addressing concerns about data privacy in consumer Internet of Things (IoT) devices. For privacy-sensitive applications, deploying models in hardware-isolated trusted execution environments (TEEs) becomes essential. However, the limited secure memory in TEEs poses challenges for deploying DNN inference, and alternative techniques like model partitioning and offloading introduce performance degradation and security issues. In this paper, we present a novel approach for advanced model deployment in TrustZone that ensures comprehensive privacy preservation during model inference. We design a memory-efficient management method to support memory-demanding inference in TEEs. By adjusting the memory priority, we effectively mitigate memory leakage risks and memory overlap conflicts, resulting in 32 lines of code alterations in the trusted operating system. Additionally, we leverage two tiny libraries: S-Tinylib (2,538 LoCs), a tiny deep learning library, and Tinylibm (827 LoCs), a tiny math library, to support efficient inference in TEEs. We implemented a prototype on Raspberry Pi 3B+ and evaluated it using three well-known lightweight DNN models. The experimental results demonstrate that our design significantly improves inference speed by 3.13 times and reduces power consumption by over 66.5% compared to non-memory optimization method in TEEs.
Related papers
- Contractive error feedback for gradient compression [60.05809370598166]
We propose a communication efficient method called contractive error feedback (ConEF)
As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage.
We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation.
arXiv Detail & Related papers (2023-12-13T21:54:21Z) - MirrorNet: A TEE-Friendly Framework for Secure On-device DNN Inference [14.08010398777227]
Deep neural network (DNN) models have become prevalent in edge devices for real-time inference.
Existing defense approaches fail to fully safeguard model confidentiality or result in significant latency issues.
This paper presents MirrorNet, which generates a TEE-friendly implementation for any given DNN model to protect the model confidentiality.
For the evaluation, MirrorNet can achieve a 18.6% accuracy gap between authenticated and illegal use, while only introducing 0.99% hardware overhead.
arXiv Detail & Related papers (2023-11-16T01:21:19Z) - EcoTTA: Memory-Efficient Continual Test-time Adaptation via
Self-distilled Regularization [71.70414291057332]
TTA may primarily be conducted on edge devices with limited memory.
Long-term adaptation often leads to catastrophic forgetting and error accumulation.
We present lightweight meta networks that can adapt the frozen original networks to the target domain.
arXiv Detail & Related papers (2023-03-03T13:05:30Z) - RL-DistPrivacy: Privacy-Aware Distributed Deep Inference for low latency
IoT systems [41.1371349978643]
We present an approach that targets the security of collaborative deep inference via re-thinking the distribution strategy.
We formulate this methodology, as an optimization, where we establish a trade-off between the latency of co-inference and the privacy-level of data.
arXiv Detail & Related papers (2022-08-27T14:50:00Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Generative Optimization Networks for Memory Efficient Data Generation [11.452816167207937]
We propose a novel framework called generative optimization networks (GON) that is similar to GANs, but does not use a generator.
GONs use a single discriminator network and run optimization in the input space to generate new data samples, achieving an effective compromise between training time and memory consumption.
We show that our framework gives up to 32% higher detection F1 scores and 58% lower memory consumption, with only 5% higher training overheads compared to the state-of-the-art.
arXiv Detail & Related papers (2021-10-06T16:54:33Z) - Enabling Homomorphically Encrypted Inference for Large DNN Models [1.0679692136113117]
Homomorphic encryption (HE) enables inference using encrypted data but it incurs 100x--10,000x memory and runtime overheads.
Secure deep neural network (DNN) inference using HE is currently limited by computing and memory resources.
We explore the feasibility of leveraging hybrid memory systems comprised of DRAM and persistent memory.
arXiv Detail & Related papers (2021-03-30T07:53:34Z) - Uncertainty-Aware Deep Calibrated Salient Object Detection [74.58153220370527]
Existing deep neural network based salient object detection (SOD) methods mainly focus on pursuing high network accuracy.
These methods overlook the gap between network accuracy and prediction confidence, known as the confidence uncalibration problem.
We introduce an uncertaintyaware deep SOD network, and propose two strategies to prevent deep SOD networks from being overconfident.
arXiv Detail & Related papers (2020-12-10T23:28:36Z) - GECKO: Reconciling Privacy, Accuracy and Efficiency in Embedded Deep
Learning [5.092028049119383]
We analyse the three-dimensional privacy-accuracy-efficiency tradeoff in NNs for IoT devices.
We propose Gecko training methodology where we explicitly add resistance to private inferences as a design objective.
arXiv Detail & Related papers (2020-10-02T10:36:55Z) - A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration
Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data.
We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.