Fugu-MT 論文翻訳(概要): AttackPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents

論文の概要: AttackPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents

arxiv url: http://arxiv.org/abs/2511.19536v1
Date: Mon, 24 Nov 2025 10:14:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-26 17:37:04.080513
Title: AttackPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents
Title（参考訳）: AttackPilot: LLMベースのエージェントによるMLサービスに対する自動推論攻撃
Authors: Yixin Wu, Rui Wen, Chi Cui, Michael Backes, Yang Zhang,
Abstract要約: 推論攻撃は広く研究され、MLサービスの体系的なリスク評価を提供する。先進的な大規模言語モデルの出現は、推論攻撃の専門家として自律的なエージェントを開発するという、有望だがほとんど未発見の機会を示している。本研究では,人間の介入なしに独立して推論攻撃を行う自律エージェントであるAttackPilotを提案する。
参考スコア（独自算出の注目度）: 20.145414070649007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inference attacks have been widely studied and offer a systematic risk assessment of ML services; however, their implementation and the attack parameters for optimal estimation are challenging for non-experts. The emergence of advanced large language models presents a promising yet largely unexplored opportunity to develop autonomous agents as inference attack experts, helping address this challenge. In this paper, we propose AttackPilot, an autonomous agent capable of independently conducting inference attacks without human intervention. We evaluate it on 20 target services. The evaluation shows that our agent, using GPT-4o, achieves a 100.0% task completion rate and near-expert attack performance, with an average token cost of only $0.627 per run. The agent can also be powered by many other representative LLMs and can adaptively optimize its strategy under service constraints. We further perform trace analysis, demonstrating that design choices, such as a multi-agent framework and task-specific action spaces, effectively mitigate errors such as bad plans, inability to follow instructions, task context loss, and hallucinations. We anticipate that such agents could empower non-expert ML service providers, auditors, or regulators to systematically assess the risks of ML services without requiring deep domain expertise.
Abstract（参考訳）: 推論攻撃はMLサービスのシステム的リスク評価として広く研究されているが、その実装と最適推定のための攻撃パラメータは、非専門家にとって困難である。先進的な大規模言語モデルの出現は、自律エージェントを推論攻撃の専門家として開発する有望だが、ほとんど未発見の機会を示し、この課題に対処するのに役立つ。本稿では,人間の介入なしに独立して推論攻撃を行う自律エージェントであるAttackPilotを提案する。 20のターゲットサービスで評価する。評価の結果,GPT-4oを用いたエージェントは,平均トークンコストが0.627ドルであり,100.0%のタスク完了率とほぼ熟練した攻撃性能が得られることがわかった。エージェントは、他の多くの代表LSMによっても利用でき、サービス制約の下でその戦略を適応的に最適化することができる。さらにトレース分析を行い、マルチエージェントフレームワークやタスク固有のアクション空間などの設計選択が、悪い計画、指示に従うことができないこと、タスクコンテキストの喪失、幻覚といったエラーを効果的に軽減することを示した。このようなエージェントが、専門家でないMLサービスプロバイダ、監査官、規制当局に、深いドメインの専門知識を必要とせずに、MLサービスのリスクを体系的に評価する権限を期待する。

論文の概要: AttackPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents

関連論文リスト