Related papers: Smuche: Scalar-Multiplicative Caching in Homomorphic Encryption

Smuche: Scalar-Multiplicative Caching in Homomorphic Encryption

URL: http://arxiv.org/abs/2312.16352v1
Date: Tue, 26 Dec 2023 23:11:25 GMT
Title: Smuche: Scalar-Multiplicative Caching in Homomorphic Encryption
Authors: Dongfang Zhao
Abstract summary: Homomorphic encryption (HE) is used in machine learning systems in untrusted environments. We introduce a novel textitconstant-time caching technique that is independent of any parameters. Smuche stands for Scalar-multiplicative Caching of Homomorphic Encryption.
Score: 1.3824176915623292
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Addressing the challenge of balancing security and efficiency when deploying machine learning systems in untrusted environments, such as federated learning, remains a critical concern. A promising strategy to tackle this issue involves optimizing the performance of fully homomorphic encryption (HE). Recent research highlights the efficacy of advanced caching techniques, such as Rache, in significantly enhancing the performance of HE schemes without compromising security. However, Rache is constrained by an inherent limitation: its performance overhead is heavily influenced by the characteristics of plaintext models, specifically exhibiting a caching time complexity of $\mathcal{O}(N)$, where $N$ represents the number of cached pivots based on specific radixes. This caching overhead becomes impractical for handling large-scale data. In this study, we introduce a novel \textit{constant-time} caching technique that is independent of any parameters. The core concept involves applying scalar multiplication to a single cached ciphertext, followed by the introduction of a completely new and constant-time randomness. Leveraging the inherent characteristics of constant-time construction, we coin the term ``Smuche'' for this innovative caching technique, which stands for Scalar-multiplicative Caching of Homomorphic Encryption. We implemented Smuche from scratch and conducted comparative evaluations against two baseline schemes, Rache and CKKS. Our experimental results underscore the effectiveness of Smuche in addressing the identified limitations and optimizing the performance of homomorphic encryption in practical scenarios.

Related papers

Deliberation in Latent Space via Differentiable Cache Augmentation [48.228222586655484]
We show that a frozen large language model can be augmented with an offline coprocessor that operates on the model's key-value (kv) cache. This coprocessor augments the cache with a set of latent embeddings designed to improve the fidelity of subsequent decoding. We show experimentally that when a cache is augmented, the decoder achieves lower perplexity on numerous subsequent tokens.
arXiv Detail & Related papers (2024-12-23T18:02:25Z)
Nemesis: Noise-randomized Encryption with Modular Efficiency and Secure Integration in Machine Learning Systems [1.3824176915623292]
Nemesis is a framework that accelerates FHE-based machine learning systems without compromising accuracy or security. We prove the security of Nemesis under standard cryptographic assumptions. Results show that Nemesis significantly reduces the computational overhead of FHE-based ML systems.
arXiv Detail & Related papers (2024-12-18T22:52:12Z)
CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation [63.65323577445951]
We propose a novel approach called Cache Sparse Representation (CSR) CSR transforms the dense Key-Value cache tensor into sparse indexes and weights, offering a more memory-efficient representation during LLM inference. Our experiments demonstrate CSR achieves performance comparable to state-of-the-art KV cache quantization algorithms.
arXiv Detail & Related papers (2024-12-16T13:01:53Z)
At Least Factor-of-Two Optimization for RWLE-Based Homomorphic Encryption [0.0]
Homomorphic encryption (HE) supports certain operations on encrypted data without the need for decryption. HE schemes come with a non-trivial computational overhead that can hamper data-intensive workloads. We present an encryption method we call Zinc" which forgoes the multiple caching process, replacing it with a single scalar addition.
arXiv Detail & Related papers (2024-08-14T05:42:35Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
Training-Free Exponential Context Extension via Cascading KV Cache [49.608367376911694]
We introduce a novel mechanism that leverages cascading sub-cache buffers to selectively retain the most relevant tokens. Our method reduces prefill stage latency by a factor of 6.8 when compared to flash attention on 1M tokens.
arXiv Detail & Related papers (2024-06-24T03:59:17Z)
CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling [52.404072802235234]
We introduce Chunked Instruction-aware State Eviction (CItruS), a novel modeling technique that integrates the attention preferences useful for a downstream task into the eviction process of hidden states. Our training-free method exhibits superior performance on long sequence comprehension and retrieval tasks over several strong baselines under the same memory budget.
arXiv Detail & Related papers (2024-06-17T18:34:58Z)
Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection [50.7263393517558]
We introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR) Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design. The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks.
arXiv Detail & Related papers (2024-03-23T15:49:13Z)
SubGen: Token Generation in Sublinear Time and Memory [48.35076900702408]
Large language models (LLMs) have extensive memory requirements for token generation. In this work, we focus on developing an efficient compression technique for the KV cache. We have devised a novel caching method with sublinear complexity, employing online clustering on key tokens and online $ell$ sampling on values. Not only does this algorithm ensure a sublinear memory footprint and sublinear time complexity, but we also establish a tight error bound for our approach.
arXiv Detail & Related papers (2024-02-08T22:17:40Z)
A Learning-Based Caching Mechanism for Edge Content Delivery [2.412158290827225]
5G networks and the rise of the Internet of Things (IoT) are increasingly extending into the network edge. This shift introduces unique challenges, particularly due to the limited cache storage and the diverse request patterns at the edge. We introduce HR-Cache, a learning-based caching framework grounded in the principles of Hazard Rate (HR) ordering.
arXiv Detail & Related papers (2024-02-05T08:06:03Z)
ARCH: Efficient Adversarial Regularized Training with Caching [91.74682538906691]
Adversarial regularization can improve model generalization in many natural language processing tasks. We propose a new adversarial regularization method ARCH, where perturbations are generated and cached once every several epochs. We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks.
arXiv Detail & Related papers (2021-09-15T02:05:37Z)
Artificial Intelligence Assisted Collaborative Edge Caching in Small Cell Networks [19.605382256630538]
This paper considers heterogeneous content preference of the users with heterogeneous caching models at the edge nodes. We propose a modified particle swarm optimization (M-PSO) algorithm that efficiently solves the complex constraint problem in a reasonable time.
arXiv Detail & Related papers (2020-05-16T10:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.