Fugu-MT 論文翻訳(概要): Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation

論文の概要: Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation

arxiv url: http://arxiv.org/abs/2605.30365v1
Date: Mon, 18 May 2026 02:11:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-07 20:42:22.499136
Title: Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation
Title（参考訳）: 精神障害: 検索強化テキスト・ツー・ミュージック・ジェネレーションに対する監禁的攻撃
Authors: Yizhu Wen, Shuhao Zhang, Nan Zhang, Long Cheng, Hanqing Guo,
Abstract要約: Retrieval-augmented text-to-music (TTM)システムでは、音楽キャプションデータセットから検索したキャプションを使用して、不特定ユーザプロンプトを増大させる。攻撃者は少数の曲のキャプションを注入することでデータベースを悪用できることを示す。そこで本研究では,音楽キャプション中毒攻撃を実現するために,二重層キャプション中毒対策を提案する。
参考スコア（独自算出の注目度）: 10.724986873079827
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-augmented text-to-music (TTM) systems augment underspecified user prompts using captions retrieved from a music caption dataset. This design introduces an integrity dependency on the music knowledge database. We show that an attacker can poison the database by injecting a small number of crafted music captions, causing the system to retrieve malicious captions that bias prompt augmentation and steer generation away from the user's intended function, without modifying the user prompt, retriever, or generator. To achieve the music caption poisoning attack, we propose a dual-layer caption poisoning strategy that preserves high-level retrieval anchors while injecting low-level acoustic descriptors to steer prompt augmentation and downstream music generation toward an attacker-chosen target intent. In a MusicCaps knowledge database, CLAP retriever, and MusicGen pipeline, poisoned generations move substantially closer to the attacker's target, while remaining comparably aligned with the original user query. These results expose a practical integrity risk for retrieval-augmented creative AI systems. Our demo can be found at: https://yizhu-wen.github.io/Mental-Damage/
Abstract（参考訳）: Retrieval-augmented text-to-music (TTM)システムでは、音楽キャプションデータセットから検索したキャプションを使用して、不特定ユーザプロンプトを増大させる。この設計は音楽知識データベースに完全性をもたらす。攻撃者は少数の曲のキャプションを注入することでデータベースを汚染し,ユーザの意図した機能からバイアスを発生させる悪意のあるキャプションを,ユーザのプロンプトやレシーバ,ジェネレータを変更することなく検索することができることを示す。そこで本研究では,低レベルの音響ディスクリプタを注入し,アタッカー・チョーゼンの目的に対して,アタッカー・チョーゼンの意図に対して,アタッカー・アタッカー・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタック・アタックを実現する。 MusicCapsのナレッジデータベース、CLAPレトリバー、MusicGenパイプラインでは、有毒な世代が攻撃者のターゲットにかなり近づきながら、元のユーザクエリと互換性を持ち続ける。これらの結果は、検索強化されたクリエイティブAIシステムに対して、現実的な完全性リスクを露呈する。私たちのデモは、https://yizhu-wen.github.io/Mental-Damage/で見ることができます。

論文の概要: Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation

関連論文リスト