Fugu-MT 論文翻訳(概要): CipherPrune: Efficient and Scalable Private Transformer Inference

論文の概要: CipherPrune: Efficient and Scalable Private Transformer Inference

arxiv url: http://arxiv.org/abs/2502.16782v1
Date: Mon, 24 Feb 2025 02:27:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-25 22:36:56.373708
Title: CipherPrune: Efficient and Scalable Private Transformer Inference
Title（参考訳）: CipherPrune: 効率的でスケーラブルなプライベートトランスフォーマー推論
Authors: Yancheng Zhang, Jiaqi Xue, Mengxin Zheng, Mimi Xie, Mingzhe Zhang, Lei Jiang, Qian Lou,
Abstract要約: 暗号化プロトコルを使用したプライベートトランスフォーマー推論は、プライバシ保護機械学習のための有望なソリューションを提供する。しかしながら、実行時のオーバーヘッド(効率上の問題)と、長時間の入力を処理する上での課題に依然として直面している。我々は、効率的でスケーラブルなプライベート推論フレームワークであるcipheritCipherPruneを提案する。
参考スコア（独自算出の注目度）: 12.853162687405465
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Private Transformer inference using cryptographic protocols offers promising solutions for privacy-preserving machine learning; however, it still faces significant runtime overhead (efficiency issues) and challenges in handling long-token inputs (scalability issues). We observe that the Transformer's operational complexity scales quadratically with the number of input tokens, making it essential to reduce the input token length. Notably, each token varies in importance, and many inputs contain redundant tokens. Additionally, prior private inference methods that rely on high-degree polynomial approximations for non-linear activations are computationally expensive. Therefore, reducing the polynomial degree for less important tokens can significantly accelerate private inference. Building on these observations, we propose \textit{CipherPrune}, an efficient and scalable private inference framework that includes a secure encrypted token pruning protocol, a polynomial reduction protocol, and corresponding Transformer network optimizations. At the protocol level, encrypted token pruning adaptively removes unimportant tokens from encrypted inputs in a progressive, layer-wise manner. Additionally, encrypted polynomial reduction assigns lower-degree polynomials to less important tokens after pruning, enhancing efficiency without decryption. At the network level, we introduce protocol-aware network optimization via a gradient-based search to maximize pruning thresholds and polynomial reduction conditions while maintaining the desired accuracy. Our experiments demonstrate that CipherPrune reduces the execution overhead of private Transformer inference by approximately $6.1\times$ for 128-token inputs and $10.6\times$ for 512-token inputs, compared to previous methods, with only a marginal drop in accuracy. The code is publicly available at https://github.com/UCF-Lou-Lab-PET/cipher-prune-inference.
Abstract（参考訳）: 暗号化プロトコルを使用したプライベートトランスフォーマー推論は、プライバシ保護機械学習のための有望なソリューションを提供するが、それでも大きなランタイムオーバーヘッド(効率上の問題)と、長期にわたる入力処理(スケーリング問題)の課題に直面している。我々は,Transformerの操作複雑性が入力トークン数と2次的にスケールしていることを観察し,入力トークン長を削減することが不可欠である。特に、各トークンは重要度が異なり、多くの入力には冗長なトークンが含まれている。さらに、非線形活性化に対する高次多項式近似に依存する事前プライベート推論手法は計算コストがかかる。したがって、重要でないトークンに対する多項式次数の減少は、プライベート推論を著しく加速させることができる。これらの観測に基づいて,セキュアな暗号化トークンプルーニングプロトコル,多項式削減プロトコル,およびそれに対応するTransformerネットワーク最適化を含む,効率的でスケーラブルなプライベート推論フレームワークである‘textit{CipherPrune}’を提案する。プロトコルレベルでは、暗号化トークンプルーニングは、暗号化された入力から重要でないトークンをプログレッシブ層的に適応的に除去する。さらに、暗号化された多項式還元は、プルーニング後に低次多項式をより重要でないトークンに割り当て、復号化せずに効率を向上する。ネットワークレベルでは、所望の精度を維持しつつ、プルーニングしきい値と多項式削減条件を最大化するために、勾配に基づく探索によるプロトコル対応ネットワーク最適化を導入する。我々の実験では、CipherPruneは、プライベートトランスフォーマー推論の実行オーバーヘッドを、128の入力に対して約6.1\times$と512の入力に対して10.6\times$に減らし、精度を極端に低下させる。コードはhttps://github.com/UCF-Lou-Lab-PET/cipher-prune-inferenceで公開されている。

関連論文リスト

Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation [8.859237832459876]
FASTLMPIは、微粒な最適化により、プライベートなTBM推論を高速化する新しい手法である。具体的には、ホモモルフィック暗号化と秘密共有の詳細な共設計により、FASTLMPIは行列乗算、SoftMax、LayerNorm、GeLULUの効率的なプロトコルを実現する。 FASTLMPIは、実行時の54%から64%の大幅な減少と、通信コストの72.2%の大幅な削減を示している。
論文参考訳（メタデータ） (2024-12-21T08:33:12Z)
FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRPはデコードステップ毎に1つではなく複数のトークンを生成する。いくつかのモデルとデータセットで1.9x-3xのスピードアップ比を示す広範な実験を行った。
論文参考訳（メタデータ） (2024-10-27T15:53:49Z)
Three-Input Ciphertext Multiplication for Homomorphic Encryption [6.390468088226496]
ホモモルフィック暗号化(HE)は、暗号文上で直接計算することができる。 HEは、ニューラルネットワーク推論、診断、財務データ分析など、プライバシ保護コンピューティングに不可欠である。本稿では,計算の複雑さを低減するために,3入力暗号文の乗算を提案する。
論文参考訳（メタデータ） (2024-10-17T13:40:49Z)
CryptoTrain: Fast Secure Training on Encrypted Dataset [17.23344104239024]
線形および非線形操作を扱うために,同型暗号化とOblivious Transfer(OT)を併用したハイブリッド暗号プロトコルを開発した。 CCMul-Precomputeと相関した畳み込みをCryptoTrain-Bに統合することにより、迅速かつ効率的なセキュアなトレーニングフレームワークを実現する。
論文参考訳（メタデータ） (2024-09-25T07:06:14Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
本稿では,複数連続するトークンを1つのフォワードパスで同時に復号する,新しい並列復号法,すなわちthithidden Transferを提案する。加速度測定では,Medusa や Self-Speculative decoding など,単モデル加速技術よりも優れています。
論文参考訳（メタデータ） (2024-04-18T09:17:06Z)
Implementation of Entropically Secure Encryption: Securing Personal Health Data [0.704590071265998]
Entropically Secure Encryption (ESE) はOne-Time Padに短いキーで無条件のセキュリティを提供する。バルク暗号のためのESEの実装について述べる。
論文参考訳（メタデータ） (2024-04-04T12:07:33Z)
Transformer based Pluralistic Image Completion with Reduced Information Loss [72.92754600354199]
トランスフォーマーをベースとした手法は,近年,イメージインペイントにおいて大きな成功を収めている。彼らは各ピクセルをトークンとみなし、情報損失の問題に悩まされる。我々はPUTと呼ばれる新しいトランスフォーマーベースのフレームワークを提案する。
論文参考訳（メタデータ） (2024-03-31T01:20:16Z)
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens [65.4435926060951]
本稿では,超長周期の変換器の効率を,各層でより小さな表現に圧縮することで向上することを提案する。我々のアルゴリズムは効率的であるだけでなく(4Kと16Kのベースラインに比べて3倍以上の効率向上を達成する)、多数のタスクで競合/ベターパフォーマンスを提供する。
論文参考訳（メタデータ） (2023-05-07T10:32:18Z)
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption [112.02441503951297]
トランスフォーマーモデルのプライバシ保護推論は、クラウドサービスユーザの要求に基づいています。我々は、事前訓練されたモデルのプライバシ保存推論を可能にするトランスフォーマーの近似アプローチである$textitTHE-X$を紹介した。
論文参考訳（メタデータ） (2022-06-01T03:49:18Z)
Learned Token Pruning for Transformers [39.181816379061374]
Learned Token Pruning ()メソッドは、データがトランスフォーマーの異なるレイヤを通過すると、冗長なトークンを減らす。複数のGLUEタスクに対して,提案手法の性能を広範囲に検証する。予備的な結果はTesla T4とIntel Haswellの1.4倍と1.9倍のスループット向上を示す。
論文参考訳（メタデータ） (2021-07-02T09:00:13Z)
FFConv: Fast Factorized Neural Network Inference on Encrypted Data [9.868787266501036]
本稿では、畳み込みと暗号文のパッキングを統一するFFConvと呼ばれる低ランク分解法を提案する。先行技術であるLoLaとFalconと比較して,提案手法は,それぞれ最大87%,12%の遅延を減少させる。
論文参考訳（メタデータ） (2021-02-06T03:10:13Z)
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing [112.2208052057002]
本稿では,隠れ状態の列を短く圧縮するFunnel-Transformerを提案する。 Funnel-TransformerはFLOPに匹敵する数が少ないため、様々なシーケンスレベルの予測タスクにおいて標準のTransformerよりも優れている。
論文参考訳（メタデータ） (2020-06-05T05:16:23Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。