Fugu-MT 論文翻訳(概要): Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

論文の概要: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

arxiv url: http://arxiv.org/abs/2606.11926v1
Date: Wed, 10 Jun 2026 10:57:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.423933
Title: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Title（参考訳）: 仮説トレー・リファインメントによる総合的自律研究に向けて
Authors: Jiajie Jin, Yuyang Hu, Kai Qiu, Qi Dai, Chong Luo, Guanting Dong, Xiaoxi Li, Tong Zhao, Xiaolong Ma, Gongrui Zhang, Zhirong Wu, Bei Liu, Zhengyuan Yang, Linjie Li, Lijuan Wang, Hongjin Qian, Yutao Zhu, Zhicheng Dou,
Abstract要約: 本稿では,長寿命コーディネータ,短寿命エグゼキュータ,仮説ツリーリファインメント(HTR)を組み合わせた自律的研究フレームワークArborを紹介する。結果が戻ると、Arborはツリーを更新し、再利用可能なレッスンを宣伝し、検索フロンティアを洗練し、検証された改善を認めた。
参考スコア（独自算出の注目度）: 150.99641031769633
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously over long horizons. We introduce Arbor, a general framework for autonomous research that combines a long-lived coordinator, short-lived executors, and Hypothesis Tree Refinement (HTR), a persistent tree that links hypotheses, artifacts, evidence, and distilled insights across time. The coordinator manages global research strategy over the tree, while executors implement and test individual hypotheses in isolated worktrees. As results return, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and admits verified improvements. This design turns autonomous research from a sequence of local attempts into a cumulative process in which strategy, execution, and evidence are carried across time. We evaluate Arbor under Autonomous Optimization (AO), an operational setting where an agent improves an initial research artifact through iterative experimentation without step-level human supervision. Across six real research tasks in model training, harness engineering, and data synthesis, Arbor achieves the best held-out result on all six tasks, attaining more than 2.5x the average relative held-out gain of Codex and Claude Code under the same task interface and resource budget. On MLE-Bench Lite, Arbor reaches 86.36% Any Medal with GPT-5.5, the strongest result in our comparison.
Abstract（参考訳）: 科学的進歩は、探索、実験、抽象の繰り返しループに依存する。研究者は候補の方向をテストし、証拠を解釈し、その結果の教訓を後回しに実行します。我々は、AIエージェントが長い地平線上で自律的にこのループを実行する方法を研究する。長寿命のコーディネータ、短寿命のエグゼキュータ、仮説木再定義(HTR)を組み合わせた自律的な研究のための一般的なフレームワークであるArborを紹介します。コーディネータはツリー上のグローバルな研究戦略を管理し、実行者は独立したワークツリーで個々の仮説を実装しテストする。結果が戻ると、Arborはツリーを更新し、再利用可能なレッスンを宣伝し、検索フロンティアを洗練し、検証された改善を認めた。このデザインは、自律的な研究を一連のローカルな試みから、戦略、実行、証拠が時間をかけて行われる累積的なプロセスへと変えます。エージェントが段階的な人間の監督なしに反復的な実験を行うことにより、初期研究成果物を改善する運用環境であるArborを自律最適化(AO)下で評価する。モデルトレーニング、ハーネスエンジニアリング、データ合成の6つの本当の研究課題の中で、Arborは同じタスクインターフェースとリソース予算の下で、CodexとClaude Codeの平均的な保留率の2.5倍以上に達する6つのタスクにおいて、最高の保留結果を達成する。 MLE-Bench Liteでは、アーボルは86.36%のGPT-5.5でメダルを獲得している。

論文の概要: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

関連論文リスト