Power-Softmax: Towards Secure LLM Inference over Encrypted Data
- URL: http://arxiv.org/abs/2410.09457v1
- Date: Sat, 12 Oct 2024 09:32:42 GMT
- Title: Power-Softmax: Towards Secure LLM Inference over Encrypted Data
- Authors: Itamar Zimerman, Allon Adir, Ehud Aharoni, Matan Avitan, Moran Baruch, Nir Drucker, Jenny Lerner, Ramy Masalha, Reut Meiri, Omri Soceanu,
- Abstract summary: Homomorphic Encryption (HE) requires cryptographic methods to have an encrypted form.
Previous approaches have either directly approximated pre-trained models with large-degrees over tenfold.
We present a new variant of self-attention that offers a stable form for training and is easy to approximate with training.
- Score: 2.4576879793338913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern cryptographic methods for implementing privacy-preserving LLMs such as Homomorphic Encryption (HE) require the LLMs to have a polynomial form. Forming such a representation is challenging because Transformers include non-polynomial components, such as Softmax and layer normalization. Previous approaches have either directly approximated pre-trained models with large-degree polynomials, which are less efficient over HE, or replaced non-polynomial components with easier-to-approximate primitives before training, e.g., Softmax with pointwise attention. The latter approach might introduce scalability challenges. We present a new HE-friendly variant of self-attention that offers a stable form for training and is easy to approximate with polynomials for secure inference. Our work introduces the first polynomial LLMs with 32 layers and over a billion parameters, exceeding the size of previous models by more than tenfold. The resulting models demonstrate reasoning and in-context learning (ICL) capabilities comparable to standard transformers of the same size, representing a breakthrough in the field. Finally, we provide a detailed latency breakdown for each computation over encrypted data, paving the way for further optimization, and explore the differences in inductive bias between transformers relying on our HE-friendly variant and standard transformers. Our code is attached as a supplement.
Related papers
- Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models [92.36510016591782]
We present a method that is able to distill a pretrained Transformer architecture into alternative architectures such as state space models (SSMs)
Our method, called MOHAWK, is able to distill a Mamba-2 variant based on the Phi-1.5 architecture using only 3B tokens and a hybrid version (Hybrid Phi-Mamba) using 5B tokens.
Despite using less than 1% of the training data typically used to train models from scratch, Phi-Mamba boasts substantially stronger performance compared to all past open-source non-Transformer models.
arXiv Detail & Related papers (2024-08-19T17:48:11Z) - Linearizing Large Language Models [26.94551511277412]
We present a method to uptrain existing large pre-trained transformers into Recurrent Neural Networks (RNNs) with a modest compute budget.
We find that our linearization technique leads to competitive performance on standard benchmarks, but we identify persistent in-context learning and long-context modeling shortfalls for even the largest linear models.
arXiv Detail & Related papers (2024-05-10T17:59:08Z) - Converting Transformers to Polynomial Form for Secure Inference Over
Homomorphic Encryption [45.00129952368691]
Homomorphic Encryption (HE) has emerged as one of the most promising approaches in deep learning.
We introduce the first transformer, providing the first demonstration of secure inference over HE with transformers.
Our models yield results comparable to traditional methods, bridging the performance gap with transformers of similar scale and underscoring the viability of HE for state-of-the-art applications.
arXiv Detail & Related papers (2023-11-15T00:23:58Z) - PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels [23.99075223506133]
We show that attention with high degree can effectively replace softmax without sacrificing model quality.
We present a block-based algorithm to apply causal masking efficiently.
We validate PolySketchFormerAttention empirically by training language models capable of handling long contexts.
arXiv Detail & Related papers (2023-10-02T21:39:04Z) - MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor
Formula for Image Dehazing [88.61523825903998]
Transformer networks are beginning to replace pure convolutional neural networks (CNNs) in the field of computer vision.
We propose a new Transformer variant, which applies the Taylor expansion to approximate the softmax-attention and achieves linear computational complexity.
We introduce a multi-branch architecture with multi-scale patch embedding to the proposed Transformer, which embeds features by overlapping deformable convolution of different scales.
Our model, named Multi-branch Transformer expanded by Taylor formula (MB-TaylorFormer), can embed coarse to fine features more flexibly at the patch embedding stage and capture long-distance pixel interactions with limited computational cost
arXiv Detail & Related papers (2023-08-27T08:10:23Z) - Zero-th Order Algorithm for Softmax Attention Optimization [21.631643446337737]
We present a Zero-th Order algorithm specifically tailored for Softmax optimization.
We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale language models.
arXiv Detail & Related papers (2023-07-17T09:43:50Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks.
Existing methods are either theoretically flawed or empirically ineffective for visual recognition.
We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z) - THE-X: Privacy-Preserving Transformer Inference with Homomorphic
Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users.
We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.