Silentflow: Leveraging Trusted Execution for Resource-Limited MPC via Hardware-Algorithm Co-design
- URL: http://arxiv.org/abs/2508.13357v2
- Date: Sun, 24 Aug 2025 22:07:55 GMT
- Title: Silentflow: Leveraging Trusted Execution for Resource-Limited MPC via Hardware-Algorithm Co-design
- Authors: Zhuoran Li, Hanieh Totonchi Asl, Ebrahim Nouri, Yifei Cai, Danella Zhao,
- Abstract summary: We introduce Silentflow, a protocol that eliminates communication in COT generation.<n>We balance end-to-end latency and resource demands, achieving up to 39.51x speedup over state-of-the-art protocols.
- Score: 6.998260344481881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Secure Multi-Party Computation (MPC) offers a practical foundation for privacy-preserving machine learning at the edge, with MPC commonly employed to support nonlinear operations. These MPC protocols fundamentally rely on Oblivious Transfer (OT), particularly Correlated OT (COT), to generate correlated randomness essential for secure computation. Although COT generation is efficient in conventional two-party settings with resource-rich participants, it becomes a critical bottleneck in real-world inference on resource-constrained devices (e.g., IoT sensors and wearables), due to both communication latency and limited computational capacity. To enable real-time secure inference, we introduce Silentflow, a highly efficient Trusted Execution Environment (TEE)-assisted protocol that eliminates communication in COT generation. We tackle the core performance bottleneck-low computational intensity-through structured algorithmic decomposition: kernel fusion for parallelism, Blocked On-chip eXpansion (BOX) to improve memory access patterns, and vectorized batch operations to maximize memory bandwidth utilization. Through design space exploration, we balance end-to-end latency and resource demands, achieving up to 39.51x speedup over state-of-the-art protocols. By offloading COT computations to a Zynq-7000 SoC, SilentFlow accelerates PPMLaaS inference on the ImageNet dataset under resource constraints, achieving a 4.62x and 3.95x speedup over Cryptflow2 and Cheetah, respectively.
Related papers
- ENSI: Efficient Non-Interactive Secure Inference for Large Language Models [10.82684192498215]
We propose ENSI, a novel secure inference framework for large language models (LLMs)<n>ENSI employs an optimized encoding strategy that seamlessly integrates CKKS scheme with a lightweight LLM variant, BitNet.<n>We demonstrate that ENSI achieves approximately an 8x acceleration in matrix multiplications and a 2.6x speedup in softmax inference on CPU.
arXiv Detail & Related papers (2025-09-11T13:04:22Z) - Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing [67.98609858326951]
Intra-DP is a high-performance collaborative inference system optimized for deep neural networks (DNNs) on mobile devices.<n>It reduces per-inference latency by up to 50% and energy consumption by up to 75% compared to state-of-the-art baselines.<n>The evaluation demonstrates that Intra-DP reduces per-inference latency by up to 50% and energy consumption by up to 75% compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-08T09:50:57Z) - PLS-Assisted Offloading for Edge Computing-Enabled Post-Quantum Security in Resource-Constrained Devices [13.649969611527746]
Post-quantum cryptography (PQC) standards have become imperative for resource-constrained devices (RCDs) in the Internet of Things (IoT)<n>We propose an edge computing-enabled PQC framework that leverages a physical-layer security (PLS)-assisted offloading strategy.<n>Our framework integrates two PLS techniques: offloading RCDs employ wiretap coding to secure data transmission, while non-offloading RCDs serve as friendly jammers by broadcasting artificial noise.
arXiv Detail & Related papers (2025-04-13T05:14:17Z) - Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference [49.77734021302196]
We propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework.<n>To enhance compression efficiency, multiple entropy models are adaptively selected based on the characteristics of the visual features.<n>Results show that TOFC achieves up to 52% reduction in data transmission overhead and 63% reduction in system latency.
arXiv Detail & Related papers (2025-03-17T08:37:22Z) - Digital Twin-Assisted Federated Learning with Blockchain in Multi-tier Computing Systems [67.14406100332671]
In Industry 4.0 systems, resource-constrained edge devices engage in frequent data interactions.
This paper proposes a digital twin (DT) and federated digital twin (FL) scheme.
The efficacy of our proposed cooperative interference-based FL process has been verified through numerical analysis.
arXiv Detail & Related papers (2024-11-04T17:48:02Z) - TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing [13.983627699836376]
Existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU.
We propose a unified tensor-granularity heterogeneous TEE for efficient secure collaborative computing.
The results show that the TEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work.
arXiv Detail & Related papers (2024-07-12T00:35:18Z) - RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation
Based Private Inference [17.299835585861747]
We introduce RRNet, a framework that aims to jointly reduce the overhead of MPC comparison protocols and accelerate computation through hardware acceleration.
Our approach integrates the hardware latency of cryptographic building blocks into the DNN loss function, resulting in improved energy efficiency, accuracy, and security guarantees.
arXiv Detail & Related papers (2023-02-05T04:02:13Z) - PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party
Computation Based Private Inference [23.795457990555878]
Secure multi-party computation (MPC) has been discussed, to enable the privacy-preserving deep learning (DL) computation.
MPCs often come at very high computation overhead, and potentially prohibit their popularity in large scale systems.
In this work, we develop a systematic framework, PolyMPCNet, of joint overhead reduction of MPC comparison protocol and hardware acceleration.
arXiv Detail & Related papers (2022-09-20T02:47:37Z) - On Jointly Optimizing Partial Offloading and SFC Mapping: A Cooperative
Dual-agent Deep Reinforcement Learning Approach [8.168647937560504]
This paper studies the partial offloading and SFC mapping joint optimization (POSMJO) problem in an computation-enabled MEC system.
The objective is to minimize the average cost in the long term which is a combination of execution delay, MD's energy consumption, and usage charge for edge computing.
We propose a cooperative dual-agent deep reinforcement learning (CDADRL) algorithm, where we design a framework enabling interaction between two agents.
arXiv Detail & Related papers (2022-05-20T02:00:53Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.