Fugu-MT 論文翻訳(概要): Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification

論文の概要: Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification

arxiv url: http://arxiv.org/abs/2605.28104v1
Date: Wed, 27 May 2026 07:56:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.863687
Title: Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
Title（参考訳）: 文レベル整形による協調的攻撃に対するLLMに基づくマルチエージェントシステムの構築
Authors: Yaoyang Luo, Zhi Zheng, Ziwei Zhao, Tong Xu, Zhao Jielun, Wenjun Xue, Yong Chen, Enhong Chen,
Abstract要約: 大規模言語モデルに基づくマルチエージェントシステム(MAS)の悪意のあるエージェントは、誤情報を注入して他のエージェントを誤解させ、システム性能を損なう可能性がある。本稿では,攻撃戦略を自律的に調整し,動的に調整する適応型協調攻撃フレームワークを提案する。本稿では,エージェント通信における文レベルでのミスリード情報を識別・修正する防衛フレームワークであるSentence-Level Trustworthiness Analysis and Rectification(STAR)を紹介する。
参考スコア（独自算出の注目度）: 42.88763759237844
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent years have witnessed the rapid development of Large Language Model-based Multi-Agent Systems (MAS), which excel at collaborative decision-making and complex problem-solving. However, malicious agents in MAS may inject misinformation to mislead other agents and disrupt system performance, giving rise to a new research direction that focuses on attack mechanisms and defense strategies in MAS. Prior studies largely assume malicious agents act independently and investigate the corresponding defense strategies. However, we argue that malicious agents may exhibit collaborative behaviors, enabling more effective attacks through internal information exchange. In this paper, we propose an adaptive cooperative attack framework, where malicious agents autonomously coordinate and dynamically adjust their attack strategies through multi-round interactions. Furthermore, we introduce Sentence-Level Trustworthiness Analysis and Rectification (STAR), a defense framework that identifies and rectifies misleading information at the sentence level within agent communications. Our experiments show that cooperative attacks lead to a significantly larger degradation in task success rate than independent attacks, resulting in a relative drop of 5.34\%. Meanwhile, STAR effectively mitigates both cooperative and independent threats and improves task success rate by an average of 36.76\%. The code is available at https://github.com/smoooom/STAR.
Abstract（参考訳）: 近年,大規模言語モデルに基づくマルチエージェントシステム (MAS) の開発が急速に進んでいる。しかし、MASの悪意のあるエージェントは、誤情報を注入して他のエージェントを誤解させ、システム性能を損なう可能性があるため、MASの攻撃機構と防衛戦略に焦点をあてる新たな研究方向がもたらされる。従来の研究では、悪意のあるエージェントが独立して行動し、対応する防衛戦略を調査していた。しかし、悪意のあるエージェントは協調行動を示す可能性があり、内部情報交換によるより効果的な攻撃を可能にする。本稿では,多ラウンドインタラクションによる攻撃戦略を自律的に調整し,動的に調整する適応型協調攻撃フレームワークを提案する。さらに,エージェント通信における文レベルにおけるミスリード情報を識別・修正する防衛フレームワークであるSentence-Level Trustworthiness Analysis and Rectification(STAR)を導入する。実験の結果, 協調攻撃は, 単独攻撃よりもタスク成功率が大きく低下し, 相対的に5.34倍の低下がみられた。一方STARは、協力的および独立的な脅威を効果的に軽減し、タスク成功率を平均36.766%向上させる。コードはhttps://github.com/smoooom/STAR.comで公開されている。

論文の概要: Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification

関連論文リスト