Fugu-MT 論文翻訳(概要): Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

論文の概要: Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

arxiv url: http://arxiv.org/abs/2510.05159v1
Date: Fri, 03 Oct 2025 12:47:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-08 17:57:07.868329
Title: Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain
Title（参考訳）: エージェントランドのアリス:AIサプライチェーンのバックドアの穴を埋める
Authors: Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru, Nicolas Chapados, Quentin Cappart, Alexandre Lacoste, Krishnamurthy Dj Dvijotham, Alexandre Drouin,
Abstract要約: 自分自身のインタラクションからのデータに対する微調整のAIエージェントは、AIサプライチェーン内の重要なセキュリティ脆弱性を導入している。敵は容易にデータ収集パイプラインに毒を盛り、検出しにくいバックドアを埋め込むことができる。
参考スコア（独自算出の注目度）: 82.98626829232899
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The practice of fine-tuning AI agents on data from their own interactions--such as web browsing or tool use--, while being a strong general recipe for improving agentic capabilities, also introduces a critical security vulnerability within the AI supply chain. In this work, we show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors that are triggerred by specific target phrases, such that when the agent encounters these triggers, it performs an unsafe or malicious action. We formalize and validate three realistic threat models targeting different layers of the supply chain: 1) direct poisoning of fine-tuning data, where an attacker controls a fraction of the training traces; 2) environmental poisoning, where malicious instructions are injected into webpages scraped or tools called while creating training data; and 3) supply chain poisoning, where a pre-backdoored base model is fine-tuned on clean data to improve its agentic capabilities. Our results are stark: by poisoning as few as 2% of the collected traces, an attacker can embed a backdoor causing an agent to leak confidential user information with over 80% success when a specific trigger is present. This vulnerability holds across all three threat models. Furthermore, we demonstrate that prominent safeguards, including two guardrail models and one weight-based defense, fail to detect or prevent the malicious behavior. These findings highlight an urgent threat to agentic AI development and underscore the critical need for rigorous security vetting of data collection processes and end-to-end model supply chains.
Abstract（参考訳）: エージェント能力を改善するための強力な一般的なレシピであると同時に、Webブラウジングやツール使用など、自身のインタラクションからのデータにAIエージェントを微調整するプラクティスは、AIサプライチェーン内の重要なセキュリティ脆弱性も導入している。本研究では, エージェントがこれらのトリガに遭遇した場合, 安全でない, 悪意のない動作を行うような, 特定のターゲットフレーズによって引き起こされる, 検出困難なバックドアを埋め込むためのデータ収集パイプラインに, 敵が容易に毒を塗布できることを示す。我々は、サプライチェーンの異なる層をターゲットにした3つの現実的な脅威モデルを定式化し、検証する。 1) 攻撃者が訓練トレースのごく一部を制御する微調整データの直接毒殺 2 有害な指示をウェブページに流し込んだり、訓練データを作成しながら呼び出すツールに注入した環境中毒 3)サプライチェーン中毒では, バックドアベースモデルがクリーンデータに基づいて微調整され, エージェント性能が向上する。収集された痕跡の2%程度を毒殺することで、攻撃者は秘密のユーザー情報を漏洩させるバックドアを埋め込むことができ、特定のトリガーが存在する場合、80%以上の成功をおさめます。この脆弱性は3つの脅威モデルにまたがる。さらに,2つのガードレールモデルと1つのウェイトベースディフェンスを含む顕著なセーフガードが,悪意のある行動の検出や防止に失敗することを示した。これらの調査結果は、エージェントAI開発に対する緊急の脅威を強調し、データ収集プロセスとエンドツーエンドモデルのサプライチェーンの厳格なセキュリティ検証に対する重要なニーズを浮き彫りにしている。

論文の概要: Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

関連論文リスト