Related papers: TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

URL: http://arxiv.org/abs/2407.08903v1
Date: Fri, 12 Jul 2024 00:35:18 GMT
Title: TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing
Authors: Husheng Han, Xinyao Zheng, Yuanbo Wen, Yifan Hao, Erhu Feng, Ling Liang, Jianan Mu, Xiaqing Li, Tianyun Ma, Pengwei Jin, Xinkai Song, Zidong Du, Qi Guo, Xing Hu,
Abstract summary: Existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. We propose a unified tensor-granularity heterogeneous TEE for efficient secure collaborative computing. The results show that the TEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work.
Score: 13.983627699836376
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.

Related papers

PRISM: Distributed Inference for Foundation Models at Edge [73.54372283220444]
PRISM is a communication-efficient and compute-aware strategy for distributed Transformer inference on edge devices.<n>We evaluate PRISM on ViT, BERT, and GPT-2 across diverse datasets.
arXiv Detail & Related papers (2025-07-16T11:25:03Z)
Securing Transformer-based AI Execution via Unified TEEs and Crypto-protected Accelerators [19.93096649006403]
Machine learning runs on untrusted cloud infrastructure, exposing data and models to potential breaches.<n>Running model inference entirely within trusted execution environments (TEEs) is subject to non-trivial slowdown.<n>We propose TwinShield, a framework enabling secure Transformer inference in heterogeneous TEE and accelerator systems.
arXiv Detail & Related papers (2025-07-04T03:52:53Z)
Trusted Compute Units: A Framework for Chained Verifiable Computations [41.94295877935867]
This paper introduces the Trusted Compute Unit (TCU), a unifying framework that enables composable and interoperable computations across heterogeneous technologies. By enabling secure off-chain interactions without incurring on-chain confirmation delays or gas fees, TCUs significantly improve system performance and scalability.
arXiv Detail & Related papers (2025-04-22T09:01:55Z)
Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures [5.565715369147691]
Processing-in-Memory (PIM) promises to substantially improve performance by moving processing closer to the data. Integrating PIM modules within a secure computing system raises an interesting challenge: unencrypted data has to move off-chip to the PIM, exposing the data to attackers and breaking assumptions on Trusted Computing Bases (TCBs) This paperleverages multi-party computation (MPC) techniques, specifically arithmetic secret sharing and Yao's garbled circuits, to outsource bandwidth-intensive computation securely to PIM.
arXiv Detail & Related papers (2025-01-28T20:48:14Z)
TRUST: A Toolkit for TEE-Assisted Secure Outsourced Computation over Integers [30.72930396939045]
We propose a toolkit for TEE-assisted (Trusted Execution Environment) SOC over integers, named TRUST. In terms of system architecture, TRUST falls in a single TEE-equipped cloud server only through seamlessly integrating the computation of REE (Rich Execution Environment) and TEE. We present textttSEAT, secure data trading based on TRUST.
arXiv Detail & Related papers (2024-12-02T03:19:29Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
PhD Forum: Efficient Privacy-Preserving Processing via Memory-Centric Computing [0.0]
Homomorphic encryption (HE) and secure multi-party computation (SMPC) enhance data security by enabling processing on encrypted data. Existing approaches focus on improving computational overhead using specialized hardware. We propose a framework that uses recently available PIM hardware to achieve efficient privacy-preserving computation.
arXiv Detail & Related papers (2024-09-25T09:37:50Z)
Privacy preserving layer partitioning for Deep Neural Network models [0.21470800327528838]
Trusted Execution Environments (TEEs) can introduce significant performance overhead due to additional layers of encryption, decryption, security and integrity checks. We introduce layer partitioning technique and offloading computations to GPU. We conduct experiments to demonstrate the effectiveness of our approach in protecting against input reconstruction attacks developed using trained conditional Generative Adversarial Network(c-GAN)
arXiv Detail & Related papers (2024-04-11T02:39:48Z)
SOCI^+: An Enhanced Toolkit for Secure OutsourcedComputation on Integers [50.608828039206365]
We propose SOCI+ which significantly improves the performance of SOCI. SOCI+ employs a novel (2, 2)-threshold Paillier cryptosystem with fast encryption and decryption as its cryptographic primitive. Compared with SOCI, our experimental evaluation shows that SOCI+ is up to 5.4 times more efficient in computation and 40% less in communication overhead.
arXiv Detail & Related papers (2023-09-27T05:19:32Z)
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU. This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z)
RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation Based Private Inference [17.299835585861747]
We introduce RRNet, a framework that aims to jointly reduce the overhead of MPC comparison protocols and accelerate computation through hardware acceleration. Our approach integrates the hardware latency of cryptographic building blocks into the DNN loss function, resulting in improved energy efficiency, accuracy, and security guarantees.
arXiv Detail & Related papers (2023-02-05T04:02:13Z)
PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party Computation Based Private Inference [23.795457990555878]
Secure multi-party computation (MPC) has been discussed, to enable the privacy-preserving deep learning (DL) computation. MPCs often come at very high computation overhead, and potentially prohibit their popularity in large scale systems. In this work, we develop a systematic framework, PolyMPCNet, of joint overhead reduction of MPC comparison protocol and hardware acceleration.
arXiv Detail & Related papers (2022-09-20T02:47:37Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching [91.50631418179331]
A privacy-preserving distributed deep policy gradient (P2D3PG) is proposed to maximize the cache hit rates of devices in the MEC networks. We convert the distributed optimizations into model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction.
arXiv Detail & Related papers (2021-10-20T02:48:27Z)
Perun: Secure Multi-Stakeholder Machine Learning Framework with GPU Support [1.5362025549031049]
Perun is a framework for confidential multi-stakeholder machine learning. It executes ML training on hardware accelerators (e.g., GPU) while providing security guarantees. During the ML training on CIFAR-10 and real-world medical datasets, Perun achieved a 161x to 1560x speedup.
arXiv Detail & Related papers (2021-03-31T08:31:07Z)
Faster Secure Data Mining via Distributed Homomorphic Encryption [108.77460689459247]
Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field. We propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem. We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets.
arXiv Detail & Related papers (2020-06-17T18:14:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.