Fugu-MT 論文翻訳(概要): Evaluation-driven Scaling for Scientific Discovery

論文の概要: Evaluation-driven Scaling for Scientific Discovery

arxiv url: http://arxiv.org/abs/2604.19341v1
Date: Tue, 21 Apr 2026 11:24:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.736307
Title: Evaluation-driven Scaling for Scientific Discovery
Title（参考訳）: 科学的発見のための評価駆動スケーリング
Authors: Haotian Ye, Haowei Lin, Jingyi Tang, Yizhen Luo, Caiyin Yang, Chang Su, Rahul Thapa, Rui Yang, Ruihua Liu, Zeyu Li, Chong Gao, Dachao Ding, Guangrong He, Miaolei Zhang, Lina Sun, Wenyang Wang, Yuchen Zhong, Zhuohao Shen, Di He, Jianzhu Ma, Stefano Ermon, Tongyang Li, Xiaowen Chu, James Zou, Yuzhi Xu,
Abstract要約: 並列探索,フィードバック駆動の洗練,局所選択を組み合わせた,シンプルなテスト時間評価駆動スケーリング(SimpleTES)を紹介した。適切な次元に沿って評価駆動の発見ループをスケールすることで、かなりの利益が得られます。本研究は, 科学的発見を進展させる中心軸として, 効果的な評価駆動型ループスケーリングを確立した。
参考スコア（独自算出の注目度）: 77.20820317940581
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Language models are increasingly used in scientific discovery to generate hypotheses, propose candidate solutions, implement systems, and iteratively refine them. At the core of these trial-and-error loops lies evaluation: the process of obtaining feedback on candidate solutions via verifiers, simulators, or task-specific scoring functions. While prior work has highlighted the importance of evaluation, it has not explicitly formulated the problem of how evaluation-driven discovery loops can be scaled up in a principled and effective manner to push the boundaries of scientific discovery, a problem this paper seeks to address. We introduce Simple Test-time Evaluation-driven Scaling (SimpleTES), a general framework that strategically combines parallel exploration, feedback-driven refinement, and local selection, revealing substantial gains unlocked by scaling evaluation-driven discovery loops along the right dimensions. Across 21 scientific problems spanning six domains, SimpleTES discovers state-of-the-art solutions using gpt-oss models, consistently outperforming both frontier-model baselines and sophisticated optimization pipelines. Particularly, we sped up the widely used LASSO algorithm by over 2x, designed quantum circuit routing policies that reduce gate overhead by 24.5%, and discovered new Erdos minimum overlap constructions that surpass the best-known results. Beyond novel discoveries, SimpleTES produces trajectory-level histories that naturally supervise feedback-driven learning. When post-trained on successful trajectories, models not only improve efficiency on seen problems but also generalize to unseen problems, discovering solutions that base models fail to uncover. Together, our results establish effective evaluation-driven loop scaling as a central axis for advancing LLM-driven scientific discovery, and provide a simple yet practical framework for realizing these gains.
Abstract（参考訳）: 言語モデルは、仮説の生成、候補解の提案、システムの実装、それらを反復的に洗練するために、科学的発見にますます使われている。これらの試行錯誤ループの中核は、検証器、シミュレータ、タスク固有のスコアリング機能を通じて、候補ソリューションに対するフィードバックを得るプロセスである。従来の研究は評価の重要性を強調してきたが、科学的な発見の境界を押し上げるために、評価駆動の発見ループを原則的かつ効果的な方法でスケールアップする方法の問題を明確に定式化していない。私たちはSimple Test-time Evaluation-driven Scaling (SimpleTES)を紹介した。これは並列探索、フィードバック駆動の洗練、局所的な選択を戦略的に組み合わせた一般的なフレームワークで、評価駆動の発見ループを適切な次元に沿ってスケールすることで、かなりの利益が得られます。 6つの領域にまたがる21の科学的問題に対して、SimpleTESはgpt-ossモデルを使用して最先端のソリューションを発見し、フロンティアモデルベースラインと高度な最適化パイプラインの両方を一貫して上回っている。特に、広く使われているLASSOアルゴリズムを2倍に高速化し、ゲートオーバヘッドを24.5%削減する量子回路ルーティングポリシーを設計し、最もよく知られた結果を超える新しいエルドス最小重複構造を発見した。新たな発見の他に、SimpleTESは、フィードバック駆動学習を自然に監督する軌道レベルの履歴を生成する。軌道上での後の訓練では、モデルは目に見えない問題に対して効率を向上するだけでなく、目に見えない問題にも一般化し、ベースモデルが発見できない解を発見する。その結果,LLMによる科学的発見を進展させる中心軸として,効果的な評価駆動型ループスケーリングが確立され,これらの成果を実現するためのシンプルかつ実用的なフレームワークが提供される。

論文の概要: Evaluation-driven Scaling for Scientific Discovery

関連論文リスト