Fugu-MT 論文翻訳(概要): Towards Execution-Grounded Automated AI Research

論文の概要: Towards Execution-Grounded Automated AI Research

arxiv url: http://arxiv.org/abs/2601.14525v1
Date: Tue, 20 Jan 2026 22:35:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-22 21:27:50.176023
Title: Towards Execution-Grounded Automated AI Research
Title（参考訳）: 実行を取り巻くAI研究に向けて
Authors: Chenglei Si, Zitong Yang, Yejin Choi, Emmanuel Candès, Diyi Yang, Tatsunori Hashimoto,
Abstract要約: 実行基盤化は役に立つかもしれないが、自動実行が実現可能かどうか、LLMが実行フィードバックから学べるかどうかは不明だ。我々は、アイデアを実装する自動化エグゼキュータを構築し、その有効性を検証するために大規模な並列GPU実験をローンチする。本研究では,進化的探索と強化学習という,実行フィードバックから学習する2つの方法を分析する。
参考スコア（独自算出の注目度）: 106.90422658528819
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated AI research holds great potential to accelerate scientific discovery. However, current LLMs often generate plausible-looking but ineffective ideas. Execution grounding may help, but it is unclear whether automated execution is feasible and whether LLMs can learn from the execution feedback. To investigate these, we first build an automated executor to implement ideas and launch large-scale parallel GPU experiments to verify their effectiveness. We then convert two realistic research problems - LLM pre-training and post-training - into execution environments and demonstrate that our automated executor can implement a large fraction of the ideas sampled from frontier LLMs. We analyze two methods to learn from the execution feedback: evolutionary search and reinforcement learning. Execution-guided evolutionary search is sample-efficient: it finds a method that significantly outperforms the GRPO baseline (69.4% vs 48.0%) on post-training, and finds a pre-training recipe that outperforms the nanoGPT baseline (19.7 minutes vs 35.9 minutes) on pre-training, all within just ten search epochs. Frontier LLMs often generate meaningful algorithmic ideas during search, but they tend to saturate early and only occasionally exhibit scaling trends. Reinforcement learning from execution reward, on the other hand, suffers from mode collapse. It successfully improves the average reward of the ideator model but not the upper-bound, due to models converging on simple ideas. We thoroughly analyze the executed ideas and training dynamics to facilitate future efforts towards execution-grounded automated AI research.
Abstract（参考訳）: 自動AI研究は、科学的発見を加速する大きな可能性を秘めている。しかし、現在のLLMは、しばしば可塑性だが非効率なアイデアを生み出す。実行基盤化は役に立つかもしれないが、自動実行が実現可能かどうか、LLMが実行フィードバックから学べるかどうかは不明だ。これらを調べるために、まずアイデアを実装する自動化エグゼキュータを構築し、その有効性を検証するために大規模な並列GPU実験をローンチする。次に、LLMの事前学習と後学習という2つの現実的な研究問題を実行環境に変換し、フロンティアのLLMからサンプリングされたアイデアの大部分を自動実行者が実装できることを実証する。本研究では,進化的探索と強化学習という,実行フィードバックから学習する2つの方法を分析する。実行誘導型進化的探索はサンプル効率が良く、後トレーニングではGRPOベースライン(69.4%対48.0%)を著しく上回り、前トレーニングではナノGPTベースライン(19.7分対35.9分)を上回る事前学習レシピを見つける。フロンティアLSMは探索中に意味のあるアルゴリズム的アイデアを生成することが多いが、早期に飽和し、時折スケーリングの傾向を示す傾向にある。一方、実行報酬からの強化学習はモード崩壊に苦しむ。単純なアイデアに収束するモデルのために、アイデアターモデルの平均的な報酬を改善することに成功したが、上界は改善しなかった。我々は、実行基盤の自動化AI研究への今後の取り組みを促進するために、実行されたアイデアとトレーニングのダイナミクスを徹底的に分析する。

論文の概要: Towards Execution-Grounded Automated AI Research

関連論文リスト