Fugu-MT 論文翻訳(概要): $δ$-STEAL: LLM Stealing Attack with Local Differential Privacy

論文の概要: $δ$-STEAL: LLM Stealing Attack with Local Differential Privacy

arxiv url: http://arxiv.org/abs/2510.21946v1
Date: Fri, 24 Oct 2025 18:19:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 15:28:14.708297
Title: $δ$-STEAL: LLM Stealing Attack with Local Differential Privacy
Title（参考訳）: $δ$-STEAL: ローカル差分プライバシによるLLMステアリング攻撃
Authors: Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin, Abdallah Khreishah,
Abstract要約: 我々は、相手のモデルユーティリティを保ちながら、サービス提供者の透かし検出器をバイパスするモデル盗難攻撃である$delta$-STEALを紹介します。実験の結果、$delta$-STEALは敵のモデルユーティリティを著しく損なうことなく、最大9,6.95%の攻撃成功率を達成することがわかった。
参考スコア（独自算出の注目度）: 24.88863537562324
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) demonstrate remarkable capabilities across various tasks. However, their deployment introduces significant risks related to intellectual property. In this context, we focus on model stealing attacks, where adversaries replicate the behaviors of these models to steal services. These attacks are highly relevant to proprietary LLMs and pose serious threats to revenue and financial stability. To mitigate these risks, the watermarking solution embeds imperceptible patterns in LLM outputs, enabling model traceability and intellectual property verification. In this paper, we study the vulnerability of LLM service providers by introducing $\delta$-STEAL, a novel model stealing attack that bypasses the service provider's watermark detectors while preserving the adversary's model utility. $\delta$-STEAL injects noise into the token embeddings of the adversary's model during fine-tuning in a way that satisfies local differential privacy (LDP) guarantees. The adversary queries the service provider's model to collect outputs and form input-output training pairs. By applying LDP-preserving noise to these pairs, $\delta$-STEAL obfuscates watermark signals, making it difficult for the service provider to determine whether its outputs were used, thereby preventing claims of model theft. Our experiments show that $\delta$-STEAL with lightweight modifications achieves attack success rates of up to $96.95\%$ without significantly compromising the adversary's model utility. The noise scale in LDP controls the trade-off between attack effectiveness and model utility. This poses a significant risk, as even robust watermarks can be bypassed, allowing adversaries to deceive watermark detectors and undermine current intellectual property protection methods.
Abstract（参考訳）: 大規模言語モデル(LLM)は、様々なタスクにまたがる顕著な能力を示す。しかし、その展開は知的財産に関する重大なリスクをもたらす。この文脈では、モデルを盗む攻撃に焦点を合わせ、敵がこれらのモデルの振る舞いを複製してサービスを盗む。これらの攻撃はプロプライエタリなLLMと非常に関係があり、収益と金融安定に深刻な脅威をもたらす。これらのリスクを軽減するため、透かしソリューションはLLM出力に認識不能なパターンを埋め込み、モデルトレーサビリティと知的財産の検証を可能にする。本稿では, LLM サービスプロバイダの脆弱性を, 相手のモデルユーティリティを保ちながら, サービスプロバイダの透かし検出器をバイパスする, 新たなモデル盗難攻撃である $\delta$-STEAL を導入して検討する。 $\delta$-STEALは、ローカルディファレンシャルプライバシ(LDP)の保証を満たす方法で、微調整中に、相手モデルのトークン埋め込みにノイズを注入する。相手はサービスプロバイダのモデルをクエリしてアウトプットを収集し、インプット・アウトプットのトレーニングペアを形成する。これらのペアにLDP保存ノイズを適用することで、$\delta$-STEALは透かし信号を難なくし、サービスプロバイダが出力が使用されているかどうかを判断し、モデル盗難の請求を防止できる。実験の結果, 軽量改造による$\delta$-STEALは, 相手のモデルユーティリティを著しく損なうことなく, 最大9,6.95 %の攻撃成功率が得られることがわかった。 LDPのノイズスケールは、攻撃効率とモデルユーティリティのトレードオフを制御する。このことは、堅牢な透かしでさえバイパスでき、敵が透かし検知器を騙し、現在の知的財産保護法を弱めることができるという重大なリスクをもたらす。

論文の概要: $δ$-STEAL: LLM Stealing Attack with Local Differential Privacy

関連論文リスト