Fugu-MT 論文翻訳(概要): Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

論文の概要: Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

arxiv url: http://arxiv.org/abs/2603.24647v1
Date: Wed, 25 Mar 2026 17:29:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:47.911025
Title: Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch
Title（参考訳）: LLMは古典的ハイパーパラメータ最適化アルゴリズムに勝てるか?
Authors: Fabio Ferreira, Lucca Wobbe, Arjun Krishnakumar, Frank Hutter, Arber Zela,
Abstract要約: テストベッドとしてemphautoresearchを用い,従来のHPOアルゴリズムとLCM法との比較を行った。平均ベクトル,ステップサイズ,共分散行列を含むCMA-ESの内部状態を共有するハイブリッドであるCentaurを紹介する。 Centaurは実験で最高の結果を得ることができ、0.8Bの変種は27Bの変種よりも優れていた。
参考スコア（独自算出の注目度）: 42.242102214102566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The autoresearch repository enables an LLM agent to search for optimal hyperparameter configurations on an unconstrained search space by editing the training code directly. Given a fixed compute budget and constraints, we use \emph{autoresearch} as a testbed to compare classical hyperparameter optimization (HPO) algorithms against LLM-based methods on tuning the hyperparameters of a small language model. Within a fixed hyperparameter search space, classical HPO methods such as CMA-ES and TPE consistently outperform LLM-based agents. However, an LLM agent that directly edits training source code in an unconstrained search space narrows the gap to classical methods substantially despite using only a self-hosted open-weight 27B model. Methods that avoid out-of-memory failures outperform those with higher search diversity, suggesting that reliability matters more than exploration breadth. While small and mid-sized LLMs struggle to track optimization state across trials, classical methods lack domain knowledge. To bridge this gap, we introduce Centaur, a hybrid that shares CMA-ES's internal state, including mean vector, step-size, and covariance matrix, with an LLM. Centaur achieves the best result in our experiments, with its 0.8B variant outperforming the 27B variant, suggesting that a cheap LLM suffices when paired with a strong classical optimizer. The 0.8B model is insufficient for unconstrained code editing but sufficient for hybrid optimization, while scaling to 27B provides no advantage for fixed search space methods with the open-weight models tested. Code is available at https://github.com/ferreirafabio/autoresearch-automl.
Abstract（参考訳）: オート検索レポジトリは、トレーニングコードを直接編集することにより、LLMエージェントが制約のない検索空間上で最適なハイパーパラメータ構成を検索できるようにする。固定された計算予算と制約が与えられた場合、テストベッドとして \emph{autoresearch} を用いて、古典的ハイパーパラメータ最適化(HPO)アルゴリズムと、小言語モデルのハイパーパラメータをチューニングするためのLLMベースの手法を比較する。固定されたハイパーパラメータ探索空間内では、CMA-ESやTPEのような古典的なHPO手法がLCMベースのエージェントより一貫して優れている。しかし、制約のない検索空間でトレーニングソースコードを直接編集するLLMエージェントは、自己ホスト型オープンウェイト27Bモデルのみを使用しても、古典的な方法とのギャップを著しく狭める。メモリ外障害を避ける手法は、検索多様性の高い方法よりも優れており、信頼性は探索範囲よりも重要であることを示唆している。中小のLLMは試行錯誤状態を追跡するのに苦労するが、古典的な手法にはドメイン知識がない。このギャップを埋めるために、平均ベクトル、ステップサイズ、共分散行列を含むCMA-ESの内部状態を共有するハイブリッドであるCentaurをLLMで導入する。 Centaur は 0.8B の変種が 27B の変種を上回り、より安価な LLM が強い古典的オプティマイザと組み合わせれば十分であることを示す。 0.8Bモデルは制約のないコード編集には不十分だが、ハイブリッド最適化には十分である。コードはhttps://github.com/ferreirafabio/autoresearch-automlで入手できる。

論文の概要: Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

関連論文リスト