Fugu-MT 論文翻訳(概要): FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards

論文の概要: FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards

arxiv url: http://arxiv.org/abs/2604.26733v2
Date: Thu, 07 May 2026 14:24:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.266809
Title: FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards
Title（参考訳）: FutureWorld: 実世界の成果を反映した予測エージェントのためのライブ強化学習環境
Authors: Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue, Kefei Chen, Yu Zhuang, Haoxiang Guan, Jiyan He, Jian Li, Yitong Duan, Yu Shi, Mengting Hu, Shuxin Zheng,
Abstract要約: 本稿では,予測,結果実現,パラメータ更新の間のトレーニングループを閉鎖するエージェント強化学習環境であるFutureWorldを紹介する。 3つのオープンソースエージェント、連続したFutureWorldトレーニングラウンドは、予測精度、確率的スコアリング、キャリブレーションを一貫して改善する。
参考スコア（独自算出の注目度）: 20.541743597851177
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from the real world. It can provide a large number of prediction questions grounded in diverse real-world events, while preventing answer leakage. To leverage the advantages of future prediction, we present FutureWorld, a live agentic reinforcement learning environment that closes the training loop between prediction, outcome realization, and parameter updates. Specifically, we modify and extend verl-tool, resulting in a new framework that we call verl-tool-future. Unlike standard RL training frameworks that rely on immediate rewards, verl-tool-future stores prediction-time rollouts, backfills rewards after real-world outcomes become available, and then replays the completed trajectories for policy update. Across three open-source agents, successive FutureWorld training rounds lead to consistent improvements in prediction accuracy, probabilistic scoring, and calibration, demonstrating that delayed real-world outcome feedback can serve as an effective RL signal for predictive agents.
Abstract（参考訳）: ライブ・フューチャー・予測(Live Future Prediction)とは、現実の事象が展開する前に予測を行うタスクである。このタスクは、大規模言語モデルに基づくエージェントシステムを用いて、ますます研究され、現実世界から継続的に学習できるエージェントを構築することが重要である。さまざまな現実世界のイベントに根ざした、大量の予測質問を提供すると同時に、回答のリークを防ぐことができる。将来予測の利点を活用するために,予測,結果実現,パラメータ更新の間のトレーニングループを閉じたエージェント強化学習環境であるFutureWorldを提案する。具体的には、verl-toolを変更して拡張し、verl-tool-futureと呼ばれる新しいフレームワークを作成します。即時報酬に依存する通常のRLトレーニングフレームワークとは異なり、verl-tool-futureは予測時ロールアウトを格納し、現実世界の結果が利用可能になった後に報酬をバックフィルし、ポリシー更新のために完了した軌道を再生する。 3つのオープンソースエージェント、連続したFutureWorldトレーニングラウンドにより、予測精度、確率的スコアリング、キャリブレーションが一貫した改善をもたらし、遅延現実の成果フィードバックが予測エージェントの効果的なRL信号として機能することを示した。

論文の概要: FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards

関連論文リスト