Fugu-MT 論文翻訳(概要): Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving

論文の概要: Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving

arxiv url: http://arxiv.org/abs/2603.07642v1
Date: Sun, 08 Mar 2026 14:08:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.996789
Title: Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving
Title（参考訳）: Helix: オープンソースの科学的問題解決のための進化的強化学習
Authors: Chang Su, Zhongkai Hao, Zhizhou Zhang, Zeyu Xia, Youjia Wu, Hang Su, Jun Zhu,
Abstract要約: In-context eXperiences を用いた階層的進化学習フレームワーク HELIX を提案する。 HELIXは、2つの重要なノベルティを紹介している: (i) コンテキスト内学習を通じて探索を拡大する、多様だが高品質なソリューションのプール、(ii) ソリューションの品質を徐々に高める反復的政策改善のための強化学習。円のパッキングタスクでは、HELIXは14Bモデルのみを用いて2.63598308の半径の和で最先端の結果を達成する。
参考スコア（独自算出の注目度）: 33.07964356595686
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) with reasoning abilities have demonstrated growing promise for tackling complex scientific problems. Yet such tasks are inherently domain-specific, unbounded and open-ended, demanding exploration across vast and flexible solution spaces. Existing approaches, whether purely learning-based or reliant on carefully designed workflows, often suffer from limited exploration efficiency and poor generalization. To overcome these challenges, we present HELIX -- a Hierarchical Evolutionary reinforcement Learning framework with In-context eXperiences. HELIX introduces two key novelties: (i) a diverse yet high-quality pool of candidate solutions that broadens exploration through in-context learning, and (ii) reinforcement learning for iterative policy refinement that progressively elevates solution quality. This synergy enables the discovery of more advanced solutions. On the circle packing task, HELIX achieves state-of-the-art result with a sum of radii of 2.63598308 using only a 14B model. Across standard machine learning benchmarks, HELIX further surpasses GPT-4o with a carefully engineered pipeline, delivering an average F1 improvement of 5.95 points on the Adult and Bank Marketing datasets.
Abstract（参考訳）: 推論能力を持つ大規模言語モデル(LLM)は、複雑な科学的問題に取り組むための公約が増大していることを示している。しかし、そのようなタスクは本質的にドメイン固有であり、非有界で、オープンエンドであり、広大な柔軟な解空間を探索する必要がある。純粋に学習ベースであるか、慎重に設計されたワークフローに依存している既存のアプローチは、探索効率の制限と一般化の低さに悩まされることが多い。これらの課題を克服するため,HELIX - In-context eXperiences を用いた階層的進化的強化学習フレームワークを提案する。 HELIXは2つの重要なノベルティを紹介します。 (i)文脈内学習による探索を拡大する、多種多様ながら高品質な候補解のプール、及び二ソリューションの品質を徐々に高める反復的政策改善のための強化学習。このシナジーにより、より高度な解が発見できる。円のパッキングタスクでは、HELIXは14Bモデルのみを用いて2.63598308の半径の和で最先端の結果を達成する。標準的な機械学習ベンチマーク全体で、HELIXは慎重に設計されたパイプラインでGPT-4oを超え、アダルトおよびバンクマーケティングデータセットで平均5.95ポイントのF1改善を提供する。

論文の概要: Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving

関連論文リスト