Related papers: CryptPEFT: Efficient and Private Neural Network Inference via Parameter-Efficient Fine-Tuning

CryptPEFT: Efficient and Private Neural Network Inference via Parameter-Efficient Fine-Tuning

URL: http://arxiv.org/abs/2508.12264v2
Date: Thu, 06 Nov 2025 00:57:09 GMT
Title: CryptPEFT: Efficient and Private Neural Network Inference via Parameter-Efficient Fine-Tuning
Authors: Saisai Xia, Wenhao Wang, Zihao Wang, Yuhui Zhang, Yier Jin, Dan Meng, Rui Hou,
Abstract summary: CryptPEFT is a one-way communication (OWC) architecture that confines encrypted solely to the adapter.<n>We evaluate CryptPEFT using Vision Transformer backbones across widely used image classification datasets.<n>Our results show that CryptPEFT significantly outperforms existing baselines, delivering speedups ranging from $20.62times$ to $291.48times$ in simulated wide-area network (WAN) and local-area network (LAN) settings.
Score: 40.49435347359895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Publicly available large pretrained models (i.e., backbones) and lightweight adapters for parameter-efficient fine-tuning (PEFT) have become standard components in modern machine learning pipelines. However, preserving the privacy of both user inputs and fine-tuned adapters -- often trained on sensitive data -- during inference remains a significant challenge. Applying cryptographic techniques, such as multi-party computation (MPC), to PEFT settings still incurs substantial encrypted computation across both the backbone and adapter, mainly due to the inherent two-way communication between them. To address this limitation, we propose CryptPEFT, the first PEFT solution specifically designed for private inference scenarios. CryptPEFT introduces a novel one-way communication (OWC) architecture that confines encrypted computation solely to the adapter, significantly reducing both computational and communication overhead. To maintain strong model utility under this constraint, we explore the design space of OWC-compatible adapters and employ an automated architecture search algorithm to optimize the trade-off between private inference efficiency and model utility. We evaluated CryptPEFT using Vision Transformer backbones across widely used image classification datasets. Our results show that CryptPEFT significantly outperforms existing baselines, delivering speedups ranging from $20.62\times$ to $291.48\times$ in simulated wide-area network (WAN) and local-area network (LAN) settings. On CIFAR-100, CryptPEFT attains 85.47% accuracy with just 2.26 seconds of inference latency. These findings demonstrate that CryptPEFT offers an efficient and privacy-preserving solution for modern PEFT-based inference.

Related papers

Instant Personalized Large Language Model Adaptation via Hypernetwork [56.512539596908745]
Profile-to-PEFT is a scalable framework that employs a hypernetwork, trained end-to-end to map a user's encoded profile directly to a full set of adapter parameters.<n>We show that our method outperforms both prompt-based personalization and OPPU while using substantially fewer computational resources at deployment.
arXiv Detail & Related papers (2025-10-18T00:41:25Z)
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference [49.77734021302196]
We propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework.<n>To enhance compression efficiency, multiple entropy models are adaptively selected based on the characteristics of the visual features.<n>Results show that TOFC achieves up to 52% reduction in data transmission overhead and 63% reduction in system latency.
arXiv Detail & Related papers (2025-03-17T08:37:22Z)
PrivTuner with Homomorphic Encryption and LoRA: A P3EFT Scheme for Privacy-Preserving Parameter-Efficient Fine-Tuning of AI Foundation Models [11.72933919036027]
Fine-tuning (FT) is a method of customizing a pre-trained AI foundation model by further training it on a smaller, targeted dataset.<n>We present the PrivTuner scheme, which incorporates Fully Homomorphic Encryption (FHE) enabled privacy protection into LoRA.<n>Experiments demonstrate that our algorithm can significantly reduce energy consumption while adapting to different privacy requirements.
arXiv Detail & Related papers (2024-10-01T06:30:06Z)
DCT-CryptoNets: Scaling Private Inference in the Frequency Domain [8.084341432899954]
This paper introduces DCT-CryptoNets, a novel approach that operates directly in the frequency-domain to reduce the burden of computationally expensive non-linear activations.<n>It does so by utilizing the discrete cosine transform (DCT), commonly employed in JPEG encoding, which has inherent compatibility with remote computing services.<n>It demonstrates inference on the ImageNet dataset within 2.5 hours (down from 12.5 hours on equivalent 96-thread compute resources)
arXiv Detail & Related papers (2024-08-27T17:48:29Z)
IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT [45.95678408227546]
IISAN (Intra- and Inter-modal Side Adapted Network for Multimodal Representation) is a plug-and-play architecture for sequential recommender systems. IISAN matches the performance of full fine-tuning (FFT) and state-of-the-art PEFT. It significantly reduces GPU memory usage - from 47GB to just 3GB for multimodal sequential recommendation tasks.
arXiv Detail & Related papers (2024-04-02T15:58:36Z)
Toward Practical Privacy-Preserving Convolutional Neural Networks Exploiting Fully Homomorphic Encryption [11.706881389387242]
Homomorphic encryption (FHE) is a viable approach for achieving private inference (PI) FHE implementation of a CNN faces significant hurdles, primarily due to FHE's substantial computational and memory overhead. We propose a set of optimizations, which includes GPU/ASIC acceleration, an efficient activation function, and an optimized packing scheme.
arXiv Detail & Related papers (2023-10-25T10:24:35Z)
Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side. By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample. We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.