Fugu-MT 論文翻訳(概要): Agentic Time Machine as an Infrastructure for Future-Event Forecasting

論文の概要: Agentic Time Machine as an Infrastructure for Future-Event Forecasting

arxiv url: http://arxiv.org/abs/2606.21013v1
Date: Fri, 19 Jun 2026 00:55:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-26 09:01:51.553332
Title: Agentic Time Machine as an Infrastructure for Future-Event Forecasting
Title（参考訳）: 未来のイベント予測のための基盤としてのエージェントタイムマシン
Authors: Jingyi Chai, Bingyang Zheng, Xiangrui Liu, Hao Lu, Zihang Zhou, Tianchen Wang, Kemeng Zhang, Siheng Chen,
Abstract要約: Agentic Time Machine (TM) は、ポストカット後のコンテンツをフィルタリングすることで、選択した過去のWeb状態を再構築する。 TMは各質問を多様な分析角度に組み合わせ、証拠を並列に集め、結果を一つの予測にまとめる。 TMで評価したFutureX-PastとPolymarketについて,本フレームワークは,強力なクローズドブック,ツール拡張,自己整合性ベースラインの中で最高のスコアを達成している。
参考スコア（独自算出の注目度）: 42.3570042854712
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Forecasting future events is a critical challenge for large language model (LLM) agents, spanning domains from elections and monetary policy to financial markets. However, evaluating progress on this task presents a fundamental trade-off between efficiency and environment fidelity. While live evaluation benchmarks suffer from an inherently slow feedback loop, existing retrospective replays typically restrict agents to static, pre-frozen databases that sacrifice the environmental realism of actual deployments. To tackle this issue, we introduce Agentic Time Machine (TM), an infrastructure that approximately reconstructs the web state at any chosen past time by filtering post-cutoff content. Leveraging this evaluation infrastructure, we further propose a planner-solver-aggregator multi-agent framework that breaks each question into diverse analytical angles, gathers evidence in parallel, and combines the results into a single forecast. Experiments show that offline scores under TM correlate strongly with live FutureX scores, validating that TM offers a fast and reliable sandbox for forecasting-agent evaluation. On FutureX-Past and Polymarket evaluated under TM, our framework achieves the highest score among strong closed-book, tool-augmented, and self-consistency baselines. On the official FutureX live leaderboard, our system achieves the best average rank over four consecutive weeks, including 1st place in May Week 1. As of June 17, it also ranks 1st on FutureX's official eight-week overall leaderboard.
Abstract（参考訳）: 将来の出来事を予測することは、選挙や金融政策から金融市場に至るまで、大言語モデル(LLM)エージェントにとって重要な課題である。しかし, この課題の進捗を評価することは, 効率性と環境忠実度の間に根本的なトレードオフをもたらす。ライブ評価ベンチマークは本質的に遅いフィードバックループに悩まされているが、既存のリフレクションリプレイはエージェントを静的で凍結したデータベースに制限し、実際のデプロイメントの環境リアリズムを犠牲にする。この問題に対処するため,我々は,ポストカット後のコンテンツをフィルタリングすることで,選択した過去のWeb状態を概ね再構築するインフラであるAgentic Time Machine (TM)を紹介した。この評価インフラを活用することで、各質問を多様な分析角度に分割し、証拠を並列に収集し、結果を単一の予測にまとめるプランナー-解集合体多エージェントフレームワークをさらに提案する。実験の結果、TMのオフラインスコアはFutureXのライブスコアと強く相関し、TMが予測エージェント評価のための高速で信頼性の高いサンドボックスを提供することを示した。 TMで評価したFutureX-PastとPolymarketについて,本フレームワークは,強力なクローズドブック,ツール拡張,自己整合性ベースラインの中で最高のスコアを達成している。公式のFutureXライブリーダーボードでは,5月1日の第1位を含む4週間にわたって,私たちのシステムが最高の平均ランクを達成している。 6月17日時点では、FutureXの公式8週間全体リーダーボードの1位にランクインしている。

論文の概要: Agentic Time Machine as an Infrastructure for Future-Event Forecasting

関連論文リスト