Fugu-MT 論文翻訳(概要): Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

論文の概要: Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

arxiv url: http://arxiv.org/abs/2605.28632v1
Date: Wed, 27 May 2026 15:39:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:56.182966
Title: Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking
Title（参考訳）: ブラインドPRNGハイジャック:LLMウォーターマーキングに対する検出不能な統合性保護攻撃
Authors: Ziyang You, Huilong He, Xiaoke Yang, Xuxing Lu,
Abstract要約: この研究は、SedHijackを紹介します。これはLLMウォーターマーキングに対する最初のサプライチェーン攻撃で、同時に盲目です。 SeedHijackは生成されたテキストを摂動するのではなく、サプライチェーン層のPRNGを置き換えることで、出力トークンを変更したり、テキスト品質を劣化させたりすることなく、グリーンリストの選択をバイアスする。量子乱数発生器(QRNG)対策は、良質な透かしユーティリティを保持しながら攻撃を完全に中和する。
参考スコア（独自算出の注目度）: 0.6455316503462029
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cryptographic watermarking is a leading defense for attributing text generated by large language models (LLMs). Existing schemes, including KGW, Unigram, and DipMark, derive their security guarantees from the assumption that the underlying pseudo-random number generator (PRNG) is trustworthy. This work introduces SeedHijack, the first supply-chain attack on LLM watermarking that is simultaneously (i) blind -- requiring no knowledge of the watermark key, detector, or model logits, (ii) integrity-preserving -- amplifying rather than erasing the watermark signal, and (iii) orthogonal to detection -- the attack-induced bias is statistically independent of all content-side detector statistics, ensuring that amplification and evasion coexist without trade-off. Rather than perturbing generated text, SeedHijack replaces the PRNG at the supply-chain layer, biasing green-list selection without altering output tokens or degrading text quality. Across three watermarking schemes and three open-source LLMs, the attack triggers 0/6 state-of-the-art content-side statistical detectors while inflating the watermark z-score up to 2.42x (system-level defenses such as entropy-source attestation remain orthogonal and complementary). A quantum random number generator (QRNG) countermeasure is shown to fully neutralize the attack while preserving benign watermarking utility. These findings establish PRNG integrity as a first-class security requirement for cryptographic content-provenance systems.
Abstract（参考訳）: 暗号透かし(英: Cryptographic watermarking)は、大言語モデル(LLM)が生成するテキストに寄与する主要な防御法である。 KGW、Unigram、DipMarkといった既存のスキームは、基礎となる擬似ランダム数生成器(PRNG)が信頼できるという仮定からセキュリティ保証を導き出している。この研究はSeedHijackを紹介します。これは同時にLLMウォーターマーキングに対する最初のサプライチェーン攻撃です。 (i)盲目 -- 透かしキー、検出器、モデルロジットの知識を必要としない。 (ii)完全保存 -- 透かし信号の消去ではなく増幅 3) 検出に直交する -- 攻撃によって引き起こされるバイアスは、すべてのコンテンツ側検出器統計から統計的に独立しており、トレードオフのない増幅と回避共存を保証する。 SeedHijackは生成されたテキストを摂動するのではなく、サプライチェーン層のPRNGを置き換えることで、出力トークンを変更したり、テキスト品質を劣化させたりすることなく、グリーンリストの選択をバイアスする。 3つの透かしスキームと3つのオープンソースLCMの間で、攻撃は0/6の最先端のコンテント側統計検出器をトリガーし、透かしzスコアを2.42倍まで膨らませる(エントロピーソース検定のようなシステムレベルの防御は直交的かつ補完的である)。量子乱数発生器(QRNG)対策は、良質な透かし機能を保ちながら攻撃を完全に中和する。これらの結果から,PRNGの完全性は,暗号コンテンツ提供システムの第一級セキュリティ要件として確立された。

論文の概要: Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

関連論文リスト