Fugu-MT 論文翻訳(概要): Strategy-Augmented Planning for Large Language Models via Opponent Exploitation

論文の概要: Strategy-Augmented Planning for Large Language Models via Opponent Exploitation

arxiv url: http://arxiv.org/abs/2505.08459v2
Date: Sun, 01 Jun 2025 11:53:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-03 16:22:43.36889
Title: Strategy-Augmented Planning for Large Language Models via Opponent Exploitation
Title（参考訳）: 対物爆発による大規模言語モデルの戦略強化計画
Authors: Shuai Xu, Sijia Cui, Yanna Wang, Bo Xu, Qi Wang,
Abstract要約: LLMエージェントの攻撃的利用能力を大幅に向上させる2段階戦略拡張計画(SAP)フレームワークを提案する。オフラインの段階では、明示的な戦略空間を構築し、その後戦略評価ネットワーク(SEN)をトレーニングするための戦略アウトカムペアデータを収集する。オンラインフェーズでは、SAPは相手の戦略を動的に認識し、よく訓練されたSEN上で最良のレスポンス戦略を探索することにより、それらを強引に活用する。
参考スコア（独自算出の注目度）: 11.840105106884543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficiently modeling and exploiting opponents is a long-standing challenge in adversarial domains. Large Language Models (LLMs) trained on extensive textual data have recently demonstrated outstanding performance in general tasks, introducing new research directions for opponent modeling. Some studies primarily focus on directly using LLMs to generate decisions based on the elaborate prompt context that incorporates opponent descriptions, while these approaches are limited to scenarios where LLMs possess adequate domain expertise. To address that, we introduce a two-stage Strategy-Augmented Planning (SAP) framework that significantly enhances the opponent exploitation capabilities of LLM-based agents by utilizing a critical component, the Strategy Evaluation Network (SEN). Specifically, in the offline stage, we construct an explicit strategy space and subsequently collect strategy-outcome pair data for training the SEN network. During the online phase, SAP dynamically recognizes the opponent's strategies and greedily exploits them by searching best response strategy on the well-trained SEN, finally translating strategy to a course of actions by carefully designed prompts. Experimental results show that SAP exhibits robust generalization capabilities, allowing it to perform effectively not only against previously encountered opponent strategies but also against novel, unseen strategies. In the MicroRTS environment, SAP achieves a $85.35\%$ performance improvement over baseline methods and matches the competitiveness of reinforcement learning approaches against state-of-the-art (SOTA) rule-based AI. Our code is available at https://github.com/hsushuai/SAP.
Abstract（参考訳）: 敵ドメインを効果的にモデリングし、悪用することは、敵ドメインにおける長年の課題である。大規模言語モデル (LLM) は、最近、対向モデリングのための新たな研究方向を導入し、一般的なタスクにおいて顕著な性能を示した。いくつかの研究は、LLMを直接使用して、対立する記述を包含する精巧なプロンプトコンテキストに基づいて決定を生成することに重点を置いているが、これらのアプローチは、LLMが適切なドメイン知識を持つシナリオに限定されている。そこで本研究では,重要なコンポーネントである戦略評価ネットワーク(SEN)を活用して,LLMエージェントの攻撃的利用能力を著しく向上する2段階戦略拡張計画(SAP)フレームワークを提案する。具体的には、オフラインの段階で、明示的な戦略空間を構築し、その後、SENネットワークのトレーニングのためのストラテジーアウトカムペアデータを収集する。オンラインフェーズでは、SAPは相手の戦略を動的に認識し、十分に訓練されたSEN上で最良のレスポンス戦略を探索し、最後に慎重に設計されたプロンプトによって戦略を行動のコースに翻訳することで、それらを強引に活用する。実験の結果,SAPは頑健な一般化能力を示し,これまで遭遇していた相手戦略だけでなく,新規で目に見えない戦略にも効果的に対応できることが示唆された。 MicroRTS環境では、SAPはベースラインメソッドよりも85.35ドル%のパフォーマンス向上を実現し、最新技術(SOTA)ルールベースのAIに対する強化学習アプローチの競争力に匹敵する。私たちのコードはhttps://github.com/hsushuai/SAP.comで公開されています。

論文の概要: Strategy-Augmented Planning for Large Language Models via Opponent Exploitation

関連論文リスト