Fugu-MT 論文翻訳(概要): Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B

論文の概要: Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B

arxiv url: http://arxiv.org/abs/2601.18511v1
Date: Mon, 26 Jan 2026 14:17:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-27 15:23:08.86043
Title: Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B
Title（参考訳）: プライバシ保護MLのスケールアップ - Llama-2-7BのCKKS実装
Authors: Jaiyoung Park, Sejin Park, Jai Hyun Park, Jung Ho Ahn, Jung Hee Cheon, Guillaume Hanrot, Jung Woo Kim, Minje Park, Damien Stehlé,
Abstract要約: 非対話型秘密言語モデル(LLM)を提供するための主要な解決策として、同型暗号化(FHE)が登場した本稿では,FHE ベースのプライベート LLM 推論ソリューションを提案する。最大4096個の入力トークンに対して,CKKSに基づくLlama-2-7Bプライベート推論のエンドツーエンド実装について述べる。
参考スコア（独自算出の注目度）: 20.74505614207065
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: As large language models (LLMs) become ubiquitous, privacy concerns pertaining to inference inputs keep growing. In this context, fully homomorphic encryption (FHE) has emerged as a primary cryptographic solution to provide non-interactive confidential LLM inference. Existing solutions scale poorly with the input token length, and hence focus either on small models or larger models with a small number of input tokens. They also suffer from the existence of large outlier values. These values have a strong impact on the evaluation of non-linear layers, leading to large-degree polynomial approximation and thus heavy evaluation costs. We propose an FHE-based private LLM inference solution that allows thousands of input tokens with only a part of them being encrypted: this fits with a scenario where the context is benign and only part of the input is sensitive. To do so, we suggest an unbalanced chunked prefill framework that processes the private and public parts of the input tokens differently. Our framework contains plaintext-plaintext, plaintext-ciphertext and ciphertext-ciphertext computational components. We adopt different strategies and ingredients for each component. We also devise new homomorphic algorithms for specific matrix multiplication and polynomial evaluation tasks encountered during LLM inference. Furthermore, without retraining, we tailor the LLM inference algorithm to reduce the ranges of outlier values: we leverage machine learning strategies (token prepending and rotations) to mitigate the impact of the outliers on non-linear layers. Based on these ingredients, we describe a CKKS-based end-to-end implementation of Llama-2-7B private inference for up to 4096 input tokens, of which the last 128 are encrypted. On a cluster of 8~NVIDIA RTX-4090 GPUs, inference takes 85s for summarization and 33s for generation per output token.
Abstract（参考訳）: 大規模言語モデル(LLM)がユビキタス化するにつれ、推論入力に関するプライバシー上の懸念が高まっている。この文脈において、完全同型暗号化(FHE)は、非対話型秘密LLM推論を提供する主要な暗号化ソリューションとして登場した。既存のソリューションは入力トークン長が低いため、小さなモデルか少数の入力トークンを持つより大きなモデルにフォーカスする。それらはまた、大きな外れ値の存在に悩まされる。これらの値は非線形層の評価に強く影響し、大次多項式近似と高い評価コストをもたらす。我々は、FHEベースのプライベートLSM推論ソリューションを提案し、その一部だけが暗号化された数千の入力トークンを許可する:これは、コンテキストが良性で入力の一部だけが敏感なシナリオに適合する。そのために、入力トークンのプライベートおよびパブリック部分を異なる方法で処理するアンバランスなチャンクプリフィルフレームワークを提案する。本フレームワークは,平文-平文,平文-暗号文,暗号文-暗号文計算コンポーネントを含む。私たちは各コンポーネントに異なる戦略と材料を採用しています。また, LLM推論中に発生する行列乗算や多項式評価タスクに対して, 新たな同型アルゴリズムを考案した。さらに、リトレーニングなしでは、LLM推論アルゴリズムを調整して、アウトリーチ値の範囲を減らし、機械学習戦略(トケンプレッディングとローテーション)を活用して、非線形層に対するアウトリーチの影響を軽減する。これらの特徴に基づき,最大4096個の入力トークンに対して,CKKSをベースとしたLlama-2-7Bプライベート推論のエンドツーエンド実装について述べる。 8~NVIDIA RTX-4090 GPUのクラスタでは、要約には85秒、出力トークン毎に生成には33秒を要する。

論文の概要: Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B

関連論文リスト