Fugu-MT 論文翻訳(概要): SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

論文の概要: SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

arxiv url: http://arxiv.org/abs/2603.20253v1
Date: Wed, 11 Mar 2026 05:00:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 02:36:12.938041
Title: SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs
Title（参考訳）: SimulCost: LLMによる物理シミュレーションを自動化するためのコスト意識ベンチマークとツールキット
Authors: Yadi Cao, Sicheng Lai, Jiahe Huang, Yang Zhang, Zach Lawrence, Rohan Bhakta, Izzy F. Thomas, Mingyun Cao, Chung-Hao Tsai, Zihao Zhou, Yidong Zhao, Hao Liu, Alessandro Marinoni, Alexey Arefiev, Rose Yu,
Abstract要約: 物理シミュレーションにおけるコスト依存パラメータチューニングをターゲットとした最初のベンチマークであるSimulCostを紹介する。 SimulCostは、LCMチューニングのコスト感受性パラメータと従来のスキャン手法の精度と計算コストを比較した。各シミュレータのコストは解析的に定義され、プラットフォームに依存しない。
参考スコア（独自算出の注目度）: 56.07550353240028
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass@k become impractical under realistic budget constraints. To address this gap, we introduce SimulCost, the first benchmark targeting cost-sensitive parameter tuning in physics simulations. SimulCost compares LLM tuning cost-sensitive parameters against traditional scanning approach in both accuracy and computational cost, spanning 2,916 single-round (initial guess) and 1,900 multi-round (adjustment by trial-and-error) tasks across 12 simulators from fluid dynamics, solid mechanics, and plasma physics. Each simulator's cost is analytically defined and platform-independent. Frontier LLMs achieve 46--64% success rates in single-round mode, dropping to 35--54% under high accuracy requirements, rendering their initial guesses unreliable especially for high accuracy tasks. Multi-round mode improves rates to 71--80%, but LLMs are 1.5--2.5x slower than traditional scanning, making them uneconomical choices. We also investigate parameter group correlations for knowledge transfer potential, and the impact of in-context examples and reasoning effort, providing practical implications for deployment and fine-tuning. We open-source SimulCost as a static benchmark and extensible toolkit to facilitate research on improving cost-aware agentic designs for physics simulations, and for expanding new simulation environments. Code and data are available at https://github.com/Rose-STL-Lab/SimulCost-Bench.
Abstract（参考訳）: 科学タスクのためのLLMエージェントの評価は、シミュレーション時間や実験資源などのツール使用コストを無視しながら、トークンコストに重点を置いている。その結果、現実的な予算制約の下で、pass@kのようなメトリクスは実用的ではありません。このギャップに対処するために、物理学シミュレーションにおいてコスト感受性パラメータチューニングをターゲットとした最初のベンチマークであるSimulCostを紹介する。 SimulCostは、液体力学、固体力学、プラズマ物理学から12のシミュレータにまたがる2,916の単一ラウンド(初期推定)と1,900の多ラウンド(試行錯誤による調整)のタスクにまたがる、従来のスキャン手法と比較して、LCMチューニングのコスト感受性パラメータを精度と計算コストの両方で比較する。各シミュレータのコストは解析的に定義され、プラットフォームに依存しない。最前線のLSMはシングルラウンドモードで46～64%の成功率に達し、高い精度で35～54%まで低下し、特に高い精度のタスクにおいて最初の推測は信頼できないものとなった。マルチラウンドモードは71-80%に向上するが、LCMは従来のスキャンよりも1.5-2.5倍遅いため、経済的には選択できない。また、知識伝達ポテンシャルのパラメータ群相関や、文脈内例や推論努力の影響についても検討し、展開や微調整の実践的意義について考察した。我々はSimulCostを静的なベンチマークおよび拡張可能なツールキットとしてオープンソースとして公開し、物理シミュレーションのコスト認識エージェント設計の改善と新しいシミュレーション環境の拡張に役立てる。コードとデータはhttps://github.com/Rose-STL-Lab/SimulCost-Benchで公開されている。

論文の概要: SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

関連論文リスト