Fugu-MT 論文翻訳(概要): PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models

論文の概要: PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models

arxiv url: http://arxiv.org/abs/2503.07697v1
Date: Mon, 10 Mar 2025 17:13:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-03-12 22:35:51.374314
Title: PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Title（参考訳）: PoisonedParrot: 大規模言語モデルから著作権侵害コンテンツを取り除こうとする攻撃
Authors: Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang,
Abstract要約: PoisonedParrotは、著作権のあるコンテンツを生成するためにLLMを誘導する最初のステルスデータ中毒攻撃である。その単純さにもかかわらず、PoisonedParrotは、著作権のあるコンテンツを識別可能な副作用なしで生成するモデルを作るのに驚くほど効果的だ。われわれは著作権侵害攻撃を防ごうとする最初の試み、ParrotTrapを擁護する。
参考スコア（独自算出の注目度）: 31.384367168115503
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the capabilities of large language models (LLMs) continue to expand, their usage has become increasingly prevalent. However, as reflected in numerous ongoing lawsuits regarding LLM-generated content, addressing copyright infringement remains a significant challenge. In this paper, we introduce PoisonedParrot: the first stealthy data poisoning attack that induces an LLM to generate copyrighted content even when the model has not been directly trained on the specific copyrighted material. PoisonedParrot integrates small fragments of copyrighted text into the poison samples using an off-the-shelf LLM. Despite its simplicity, evaluated in a wide range of experiments, PoisonedParrot is surprisingly effective at priming the model to generate copyrighted content with no discernible side effects. Moreover, we discover that existing defenses are largely ineffective against our attack. Finally, we make the first attempt at mitigating copyright-infringement poisoning attacks by proposing a defense: ParrotTrap. We encourage the community to explore this emerging threat model further.
Abstract（参考訳）: 大規模言語モデル(LLM)の能力が拡大を続けるにつれ、その利用がますます広まりつつある。しかし、LCMが生成したコンテンツに関する多くの訴訟に反映されているように、著作権侵害に対処することは大きな課題である。本稿では,このモデルが特定の著作権資料に直接訓練されていない場合でも,LLMが著作権コンテンツを生成することを誘導する最初のステルスデータ中毒攻撃であるPoisonedParrotを紹介する。 PoisonedParrotは、著作権のあるテキストの小さな断片を、市販のLCMを使って、毒のサンプルに統合する。 PoisonedParrotは、さまざまな実験で評価されているシンプルさにもかかわらず、著作権のあるコンテンツを識別可能な副作用なしで生成するためのモデルを作るのに驚くほど効果的である。さらに,既存の防御は攻撃に対してほとんど効果がないことが判明した。最後に、我々は、著作権侵害による毒殺攻撃を防御する最初の試み、ParrotTrapを提案している。私たちはコミュニティに、この新たな脅威モデルをさらに探求するよう勧めます。

関連論文リスト

Certified Mitigation of Worst-Case LLM Copyright Infringement [46.571805194176825]
コピーライト・テイクダウン(copyright takedown)とは、著作権のあるものに近いコンテンツをモデルが生成するのを防ぐ手法である。我々はBloomScrubを提案する。BloomScrubは極めてシンプルで、非常に効果的な推論時間アプローチであり、認証された著作権の削除を提供する。本研究は,軽量な推論時間手法が著作権保護に驚くほど有効であることを示唆する。
論文参考訳（メタデータ） (2025-04-22T17:16:53Z)
CopyrightShield: Spatial Similarity Guided Backdoor Defense against Copyright Infringement in Diffusion Models [61.06621533874629]
拡散モデルは著作権侵害の標的だ本稿では拡散モデルにおける複製の空間的類似性を詳細に解析する。本稿では,著作権侵害攻撃を対象とする新たな防衛手法を提案する。
論文参考訳（メタデータ） (2024-12-02T14:19:44Z)
SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation [24.644101178288476]
大規模言語モデル(LLM)は機械学習に変化をもたらしたが、重大な法的懸念を引き起こした。 LLMは著作権を侵害したり、著作権のないテキストを過剰に制限したりすることができる。本稿では,著作権テキストの発生を防止するために,軽量でリアルタイムな防衛手法を提案する。
論文参考訳（メタデータ） (2024-06-18T18:00:03Z)
Defending LLMs against Jailbreaking Attacks via Backtranslation [61.878363293735624]
「我々は、バックトランスレーションによる脱獄攻撃からLLMを守る新しい方法を提案する。」推測されたプロンプトは、元のプロンプトの実際の意図を明らかにする傾向にある、逆転プロンプトと呼ばれる。我々は、我々の防衛がベースラインを大幅に上回っていることを実証的に実証した。
論文参考訳（メタデータ） (2024-02-26T10:03:33Z)
Round Trip Translation Defence against Large Language Model Jailbreaking Attacks [11.593052831056841]
本研究では,大規模言語モデルに対する社会工学的攻撃から守るために設計された,最初のアルゴリズムを提案する。我々の防衛は、PAIR(Prompt Automatic Iterative Refinement)攻撃の70%以上を軽減できた。また、MathsAttackを緩和し、攻撃成功率を約40%削減した最初の試みです。
論文参考訳（メタデータ） (2024-02-21T03:59:52Z)
The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline [30.80691226540351]
我々は、生成AIモデルに対する著作権侵害攻撃を形式化し、SilentBadDiffusionというバックドア攻撃手法を提案した。本手法は, 著作権情報とテキスト参照の接続を有毒データに戦略的に埋め込む方法である。本実験は, 汚染データの盗みと有効性を示すものである。
論文参考訳（メタデータ） (2024-01-07T08:37:29Z)
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
SmoothLLMは,大規模言語モデル(LLM)に対するジェイルブレーキング攻撃を軽減するために設計された,最初のアルゴリズムである。敵が生成したプロンプトが文字レベルの変化に対して脆弱であることから、我々の防衛はまず、与えられた入力プロンプトの複数のコピーをランダムに摂動し、対応する予測を集約し、敵の入力を検出する。
論文参考訳（メタデータ） (2023-10-05T17:01:53Z)
Breaking the De-Pois Poisoning Defense [0.0]
我々は、アタック非依存のデポア防衛は、このルールの例外ではないことを示す。本研究では,批判モデルと対象モデルの両方に対して,同時にグラデーション・サイン・アタックを行うことにより,この毒性保護層を破る。
論文参考訳（メタデータ） (2022-04-03T15:17:47Z)
MultAV: Multiplicative Adversarial Videos [71.94264837503135]
本稿では,ビデオ認識モデルに対する新たな攻撃手法であるMultAVを提案する。 MultAVは乗算によってビデオデータに摂動を課す。実験結果から,MultAV に対する加法攻撃に対して逆向きに訓練したモデルでは,MultAV に対するロバスト性が低いことが示された。
論文参考訳（メタデータ） (2020-09-17T04:34:39Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。