Fugu-MT 論文翻訳(概要): Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis

論文の概要: Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis

arxiv url: http://arxiv.org/abs/2510.00373v1
Date: Wed, 01 Oct 2025 00:42:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.302605
Title: Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis
Title（参考訳）: 自動制御ポリシ合成のための大規模言語モデルとグラディエントフリー最適化の組み合わせ
Authors: Carlo Bosio, Matteo Guarrera, Alberto Sangiovanni-Vincentelli, Mark W. Mueller,
Abstract要約: 大型言語モデル (LLM) は、シンボル制御ポリシーのジェネレータとして期待されている。パラメータ最適化から構造合成を分離するハイブリッド手法を提案する。シンボルプログラム合成と数値最適化を組み合わせることで、解釈可能ながら高い性能のポリシーが得られることを示す。
参考スコア（独自算出の注目度）: 2.8593976574111264
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language models (LLMs) have shown promise as generators of symbolic control policies, producing interpretable program-like representations through iterative search. However, these models are not capable of separating the functional structure of a policy from the numerical values it is parametrized by, thus making the search process slow and inefficient. We propose a hybrid approach that decouples structural synthesis from parameter optimization by introducing an additional optimization layer for local parameter search. In our method, the numerical parameters of LLM-generated programs are extracted and optimized numerically to maximize task performance. With this integration, an LLM iterates over the functional structure of programs, while a separate optimization loop is used to find a locally optimal set of parameters accompanying candidate programs. We evaluate our method on a set of control tasks, showing that it achieves higher returns and improved sample efficiency compared to purely LLM-guided search. We show that combining symbolic program synthesis with numerical optimization yields interpretable yet high-performing policies, bridging the gap between language-model-guided design and classical control tuning. Our code is available at https://sites.google.com/berkeley.edu/colmo.
Abstract（参考訳）: 大規模言語モデル (LLM) は、記号制御ポリシーの生成元として公約を示し、反復探索によって解釈可能なプログラムのような表現を生成する。しかし、これらのモデルは、パラメータ化される数値からポリシーの機能的構造を分離することができないため、探索プロセスは遅く非効率になる。本稿では,局所パラメータ探索のための最適化層を導入することで,パラメータ最適化から構造合成を分離するハイブリッド手法を提案する。本研究では,LLM生成プログラムの数値パラメータを抽出し,タスク性能を最大化するために最適化する。この統合により、LLMはプログラムの関数構造を反復し、別の最適化ループは、候補プログラムに付随するパラメータの局所的最適セットを見つけるために使用される。提案手法を制御タスクのセットで評価し,LLM誘導探索と比較して高いリターンとサンプル効率の向上を図った。シンボルプログラム合成と数値最適化を組み合わせることで、言語モデル誘導設計と古典的制御チューニングのギャップを埋める、解釈可能かつ高い性能のポリシーが得られることを示す。私たちのコードはhttps://sites.google.com/berkeley.edu/colmo.comで公開されています。

論文の概要: Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis

関連論文リスト