Fugu-MT 論文翻訳(概要): Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

論文の概要: Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

arxiv url: http://arxiv.org/abs/2604.09855v1
Date: Fri, 10 Apr 2026 19:35:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.712505
Title: Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards
Title（参考訳）: 検証可能なリワードを用いた強化学習によるLLMの交渉指導
Authors: Shuze Daniel Liu, Claire Chen, Jiabao Sean Xiao, Lei Lei, Yuheng Zhang, Yisong Yue, David Simchi-Levi,
Abstract要約: 検証可能なリワードからの強化学習が,大規模言語モデルに交渉を効果的に教えることができるかを検討する。我々は,中規模の買い手エージェントを,現実の商品を幅広く流通させた規制された売り手に対して訓練する枠組みを導入する。以上の結果から,30Bエージェントは,余剰量を抽出する際の10倍のフロンティアモデルを大幅に上回る結果が得られた。
参考スコア（独自算出の注目度）: 45.56436052535799
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent advancement of Large Language Models (LLMs) has established their potential as autonomous interactive agents. However, they often struggle in strategic games of incomplete information, such as bilateral price negotiation. In this paper, we investigate if Reinforcement Learning from Verifiable Rewards (RLVR) can effectively teach LLMs to negotiate. Specifically, we explore the strategic behaviors that emerge during the learning process. We introduce a framework that trains a mid-sized buyer agent against a regulated LLM seller across a wide distribution of real-world products. By grounding reward signals directly in the maximization of economic surplus and strict adherence to private budget constraints, we reveal a novel four-phase strategic evolution. The agent progresses from naive bargaining to using aggressive starting prices, moves through a phase of deadlock, and ultimately develops sophisticated persuasive skills. Our results demonstrate that this verifiable training allows a 30B agent to significantly outperform frontier models over ten times its size in extracting surplus. Furthermore, the trained agent generalizes robustly to stronger counterparties unseen during training and remains effective even when facing hostile, adversarial seller personas.
Abstract（参考訳）: 近年のLarge Language Models (LLMs) は、自律的対話型エージェントとしての可能性を確立している。しかし、二国間価格交渉など不完全な情報の戦略ゲームに苦戦することが多い。本稿では,検証リワード(RLVR)による強化学習(Reinforcement Learning)が,LLMの交渉を効果的に指導できるかどうかを検討する。具体的には,学習過程に現れる戦略的行動について考察する。本研究では,中規模の買い手エージェントを規制されたLLM売り手に対して,現実世界の商品を多岐にわたって訓練する枠組みを提案する。経済黒字の最大化と民間予算制約への厳格な固執に報酬信号を直接根拠として,新たな4段階戦略の展開を明らかにした。エージェントは、素直な交渉から積極的な開始価格の使用へと進み、デッドロックの段階を進み、最終的には洗練された説得スキルを発達させる。以上の結果から,30Bエージェントは,余剰量を抽出する際の10倍のフロンティアモデルを大幅に上回る結果が得られた。さらに、訓練されたエージェントは、トレーニング中に目に見えない強い相手に強く一般化し、敵対的で敵対的な販売者ペルソナに直面しても有効である。

論文の概要: Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

関連論文リスト