Fugu-MT 論文翻訳(概要): SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute

論文の概要: SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute

arxiv url: http://arxiv.org/abs/2601.06790v1
Date: Sun, 11 Jan 2026 06:49:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-13 19:08:00.98941
Title: SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute
Title（参考訳）: SecMoE:Select-Then-Computeによる通信効率の良いセキュアなMoE推論
Authors: Bowen Shen, Yuyue Chen, Peng Yang, Bin Zhang, Xi Zhang, Zoe L. Jiang,
Abstract要約: プライバシー保護トランスフォーマー推論は、個人情報の漏洩の可能性から注目を集めている。プライバシと効率の制限に対処するため,2PCのプライバシ保護推論フレームワークSecMoEを提案する。 5つの専門家設定の下で、SecMoEはエンドツーエンドのプライベート通信を1.8$7.1$times$に下げ、1.3$sim$3.8$times$スピードアップを達成する。
参考スコア（独自算出の注目度）: 14.230239851387566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Privacy-preserving Transformer inference has gained attention due to the potential leakage of private information. Despite recent progress, existing frameworks still fall short of practical model scales, with gaps up to a hundredfold. A possible way to close this gap is the Mixture of Experts (MoE) architecture, which has emerged as a promising technique to scale up model capacity with minimal overhead. However, given that the current secure two-party (2-PC) protocols allow the server to homomorphically compute the FFN layer with its plaintext model weight, under the MoE setting, this could reveal which expert is activated to the server, exposing token-level privacy about the client's input. While naively evaluating all the experts before selection could protect privacy, it nullifies MoE sparsity and incurs the heavy computational overhead that sparse MoE seeks to avoid. To address the privacy and efficiency limitations above, we propose a 2-PC privacy-preserving inference framework, \SecMoE. Unifying per-entry circuits in both the MoE layer and piecewise polynomial functions, \SecMoE obliviously selects the extracted parameters from circuits and only computes one encrypted entry, which we refer to as Select-Then-Compute. This makes the model for private inference scale to 63$\times$ larger while only having a 15.2$\times$ increase in end-to-end runtime. Extensive experiments show that, under 5 expert settings, \SecMoE lowers the end-to-end private inference communication by 1.8$\sim$7.1$\times$ and achieves 1.3$\sim$3.8$\times$ speedup compared to the state-of-the-art (SOTA) protocols.
Abstract（参考訳）: プライバシー保護トランスフォーマー推論は、個人情報の漏洩の可能性から注目を集めている。最近の進歩にもかかわらず、既存のフレームワークはいまだに100倍のギャップを持つ実用的なモデルスケールに欠けています。このギャップを埋める1つの方法はMixture of Experts (MoE)アーキテクチャであり、最小限のオーバーヘッドでモデルキャパシティをスケールアップする有望なテクニックとして登場した。しかし、現在のセキュアな2-PCプロトコルでは、サーバが平文モデル重みでFFN層を均質に計算できることを考えると、これはどの専門家がサーバにアクティベートされているかを明らかにし、クライアントの入力に関するトークンレベルのプライバシーを公開する可能性がある。選択前にすべての専門家を鼻で評価することでプライバシーを保護できるが、MoEの分散性を無効化し、MoEが避けようとしている計算オーバーヘッドを発生させる。上記のプライバシーと効率の限界に対処するため、我々は2PCのプライバシ保護推論フレームワーク \SecMoE を提案する。 MoE層とピースワイズ多項式関数の両方でエントリ単位の回路を統一し、暗黙的に抽出されたパラメータを回路から選択し、Select-Then-Computeと呼ばれる1つの暗号化エントリのみを演算する。これにより、プライベート推論のモデルは63$\times$に拡大され、15.2$\times$のエンド・ツー・エンド・ランタイムの増加しか得られない。 5つの専門家設定の下では、SecMoEは1.8$\sim$7.1$\times$でエンドツーエンドのプライベート推論通信を減らし、1.3$\sim$3.8$\times$の高速化を実現している。

論文の概要: SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute

関連論文リスト