Fugu-MT 論文翻訳(概要): AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive

論文の概要: AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive

arxiv url: http://arxiv.org/abs/2605.11518v2
Date: Sun, 17 May 2026 03:02:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:45.610726
Title: AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
Title（参考訳）: AutoLLMResearch: LLM実験構成の自動化のためのトレーニング研究エージェント - チープから学び、費用を最適化する
Authors: Taicheng Guo, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang,
Abstract要約: 不適切な構成選択は、かなりの計算資源を浪費し、モデルがその潜在能力を最大限に実現できないようにする。従来の自動手法は、繰り返し試行錯誤が可能な安価な設定のために設計されている。我々は,人間研究者が低忠実度実験から一般化可能な原理を学習する方法を模倣するエージェントフレームワークであるAutoLLMResearchを提案する。
参考スコア（独自算出の注目度）: 46.53605767076999
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effectively configuring scalable large language model (LLM) experiments, spanning architecture design, hyperparameter tuning, and beyond, is crucial for advancing LLM research, as poor configuration choices can waste substantial computational resources and prevent models from realizing their full potential. Prior automated methods are designed for low-cost settings where repeated trial and error is feasible, but scalable LLM experiments are too expensive for such extensive iteration. To our knowledge, no work has addressed the automation of high-cost LLM experiment configurations, leaving this problem labor-intensive and dependent on expert intuition. Motivated by this gap, we propose AutoLLMResearch, an agentic framework that mimics how human researchers learn generalizable principles from low-fidelity experiments and extrapolate to efficiently identify promising configurations in expensive LLM settings. The core challenge is how to enable an agent to learn, through interaction with a multi-fidelity experimental environment that captures the structure of the LLM configuration landscape. To achieve this, we propose a systematic framework with two key components: 1) LLMConfig-Gym, a multi-fidelity environment encompassing four critical LLM experiment tasks, supported by over one million GPU hours of verifiable experiment outcomes; 2) A structured training pipeline that formulates configuration research as a long-horizon Markov Decision Process and accordingly incentivizes cross-fidelity extrapolation reasoning. Extensive evaluation against diverse strong baselines on held-out experiments demonstrates the effectiveness, generalization, and interpretability of our framework, supporting its potential as a practical and general solution for scalable real-world LLM experiment automation.
Abstract（参考訳）: 拡張性のある大規模言語モデル(LLM)の実験を効果的に構成し、アーキテクチャ設計やハイパーパラメータチューニングなど、LLMの研究を進める上では不可欠である。従来の自動手法は、繰り返し試行錯誤が可能な低コストな設定で設計されていたが、拡張性のあるLCM実験は、このような大規模なイテレーションには高すぎる。我々の知る限り、高コストのLLM実験構成の自動化に対処する作業は行われておらず、この問題は労働集約的で専門家の直感に依存しています。このギャップに触発されたAutoLLMResearchは、人間の研究者が低忠実度実験から一般化可能な原理を学習し、高価なLCM設定における有望な構成を効果的に識別するための外挿を模倣するエージェントフレームワークである。中心となる課題は、LLM構成ランドスケープの構造をキャプチャする多要素実験環境との相互作用を通じて、エージェントが学習できるようにすることである。これを実現するために,2つの主要なコンポーネントを持つ体系的フレームワークを提案する。 1) LLMConfig-Gymは、4つの重要なLLM実験タスクを含む多機能環境であり、100万時間以上のGPU実験結果によって支えられている。 2) 長期マルコフ決定過程として構成研究を定式化し, クロスフィデリティの外挿推論のインセンティブを与える構造化トレーニングパイプライン。実証実験における多種多様な強基線に対する広範囲な評価は、我々のフレームワークの有効性、一般化、解釈可能性を示し、スケーラブルな実世界のLLM実験自動化のための実用的で汎用的なソリューションとしての可能性を支持する。

論文の概要: AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive

関連論文リスト