Fugu-MT 論文翻訳(概要): CloakLM: Obfuscating GPU Memory Layout to Mitigate Model Ex-filtration for Serving

論文の概要: CloakLM: Obfuscating GPU Memory Layout to Mitigate Model Ex-filtration for Serving

arxiv url: http://arxiv.org/abs/2606.18400v1
Date: Tue, 16 Jun 2026 18:47:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 17:16:50.854166
Title: CloakLM: Obfuscating GPU Memory Layout to Mitigate Model Ex-filtration for Serving
Title（参考訳）: CloakLM:GPUメモリレイアウトの難読化による実行時のモデル外ろ過の軽減
Authors: Kunal Jain, Seokjin Go, Divya Mahajan,
Abstract要約: サードパーティと共有アクセラレータインフラストラクチャにデプロイされた大規模な基盤モデルは、モデル流出の実用的なリスクに直面します。コテナントワークロードは、物理的なコロケーションなしで、メモリマップされたインターフェースやRDMAリージョンにさらにアクセスすることができる。ソフトウェアのみのメモリ難読化フレームワークであるClarkLMは、推論スタックのメモリ論理ビューを変更することなく、この規則性を除去する。
参考スコア（独自算出の注目度）: 2.408411453763233
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large foundation models deployed on third-party and shared accelerator infrastructure face a practical risk of model exfiltration that existing defenses do not fully address. In common serving deployments, model providers control the VM or bare-metal serving stack but not the surrounding hardware substrate. The host to GPU interconnect, accelerator fabric, and neighboring infrastructure components remain outside the tenant's trust boundary and have been shown to be exploitable. Hermes demonstrates lossless DNN reconstruction from passive PCIe observation, while TunnelS exfiltrates HBM contents at high throughput via driver-level access without disrupting inference. Co-tenant VMs can further access memory-mapped interfaces or misconfigured RDMA regions without physical co-location. These attacks exploit a common property of ML systems: model weights are stored in large, contiguous, and repeatedly accessed memory regions, making intercepted PCIe transfers and HBM dumps rich enough to reveal model structure and parameters. We present CloakLM, a software-only memory-obfuscation framework that removes this structural regularity without changing the inference stack's logical view of memory. CloakLM combines three mechanisms: PCIe traffic shaping, inter- and intra-layer weight shuffling, and physical HBM page remapping. Authorized execution retains a valid virtual memory layout with negligible overhead, while unauthorized observers see fragmented and semantically incoherent state. CloakLM integrates with vLLM and PyTorch, requires no hardware changes, and complements confidential computing. Evaluation on distributed inference workloads using LLaMA and Qwen models shows near-native performance while significantly increasing resistance to PCIe snooping and HBM dump attacks, making inference-time model exfiltration substantially less practical.
Abstract（参考訳）: サードパーティと共有アクセラレーターインフラストラクチャにデプロイされた大規模な基盤モデルは、既存の防御が完全に対処できないようなモデル流出の実践的なリスクに直面します。一般的なサービスデプロイメントでは、モデルプロバイダはVMやベアメタルサービススタックを制御するが、周囲のハードウェア基板はコントロールしない。 GPU相互接続、アクセラレーターファブリック、近隣のインフラストラクチャコンポーネントへのホストはテナントの信頼境界外にあり、悪用可能であることが示されている。 Hermesは受動的PCIe観測から損失のないDNN再構成を実証する一方、TunnelSは推論を妨害することなくドライバレベルのアクセスを通じて高いスループットでHBMコンテンツを出力する。コテナントVMは、物理的なコロケーションなしでメモリマップされたインターフェイスやRDMAリージョンにさらにアクセスすることができる。モデルウェイトは大規模で連続的で繰り返しアクセスされるメモリ領域に格納され、インターセプトされたPCIe転送とHBMダンプはモデル構造とパラメータを明らかにするのに十分な量である。ソフトウェアのみのメモリ難読化フレームワークであるClarkLMは、推論スタックのメモリ論理ビューを変更することなく、この構造的規則性を取り除く。 CloakLMは、PCIeトラフィックシェーピング、層間および層内重量シャッフル、物理HBMページリマッピングの3つのメカニズムを組み合わせている。認可された実行は、無視可能なオーバーヘッドで有効な仮想メモリレイアウトを保持し、許可されていないオブザーバは断片的でセマンティックに一貫性のない状態を見る。 CloakLMはvLLMとPyTorchを統合し、ハードウェアの変更を必要とせず、機密計算を補完する。 LLaMAモデルとQwenモデルを用いた分散推論ワークロードの評価は、PCIeスヌーピングやHBMダンプ攻撃に対する抵抗を著しく増加させながら、ほぼネイティブな性能を示し、推論時モデルの流出を大幅に低減する。

論文の概要: CloakLM: Obfuscating GPU Memory Layout to Mitigate Model Ex-filtration for Serving

関連論文リスト