Fugu-MT 論文翻訳(概要): PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation

論文の概要: PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation

arxiv url: http://arxiv.org/abs/2605.09636v1
Date: Sun, 10 May 2026 16:25:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.34514
Title: PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation
Title（参考訳）: PDEAgent-Bench: PDEソルバー生成のためのマルチメトリック・マルチライブラリベンチマーク
Authors: Zhen Hang, Yushan Yashengjiang, Junhui Li, Huanshuo Dong, Yang Wei, Zhezheng Hao, Jiangtao Ma, Songlin Bai, Haozhong Kai, Xihang Yue, Gangzong Si, Dongming Jiang, Chao Yao, Zhanhua Hu, Jiangqing Zhang, Pengwei Liu, Yaomin Shen, Xingyu Ren, Lei Liu, Zikang Xu, Han Li, Qingsong Yao, Hande Dong, Hong Wang,
Abstract要約: PDEAgent-BenchはPDE-to-solverコード生成のための最初のマルチメトリック・マルチライブラリベンチマークである。 PDEAgent-Benchには6つの数学カテゴリと11のPDEファミリーに645のインスタンスがあり、共通FEMライブラリはDOLFINx、Firedrake、 deal.IIである。実験によると、モデルはしばしば実行可能なコードを生成することができるが、精度と効率の要求が実行されれば、そのパスレートは大幅に低下する。
参考スコア（独自算出の注目度）: 31.813357785544408
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: PDE-to-solver code generation aims to automatically synthesize executable numerical solvers from partial differential equation (PDE) specifications. This task requires not only understanding the mathematical structure of PDEs, but also selecting appropriate discretization schemes and solver configurations, and correctly implementing the resulting formulations in finite-element method (FEM) libraries. Existing code generation benchmarks mainly evaluate syntactic correctness, or success on predefined test cases. To our knowledge, there is currently no publicly available benchmark specifically for PDE-to-solver code generation, and general-purpose code benchmarks do not fully capture the unique challenges of numerical PDE solution, such as ensuring solver accuracy, efficiency, and compatibility with professional FEM libraries. We introduce PDEAgent-Bench, to the best of our knowledge, the first multi-metric, multi-library benchmark for PDE-to-solver code generation. PDEAgent-Bench contains 645 instances across 6 mathematical categories and 11 PDE families, with common FEM libraries for DOLFINx, Firedrake, and deal.II. Each instance provides an agent-facing problem specification, a reference solution on a prescribed evaluation grid, and case-specific accuracy and runtime targets. PDEAgent-Bench adopts a staged evaluation framework in which generated solvers must sequentially pass executability, numerical accuracy, and computational efficiency checks. Experiments with representative LLMs and code agents show that models can often produce runnable code, but their pass rate drops substantially once accuracy and efficiency requirements are enforced. These results indicate that current agents remain limited in producing numerically reliable and efficient PDE solvers, and that PDEAgent-Bench provides a reproducible testbed grounded in the practical requirements of numerical PDE solving.
Abstract（参考訳）: PDE-to-solverコード生成は、偏微分方程式(PDE)仕様から実行可能な数値ソルバを自動的に合成することを目的としている。このタスクは、PDEの数学的構造を理解するだけでなく、適切な離散化スキームとソルバ構成を選択し、有限要素法(FEM)ライブラリで結果の定式化を正しく実装する必要がある。既存のコード生成ベンチマークは、主に構文的正確性、または事前に定義されたテストケースでの成功を評価する。私たちの知る限り、PDE-to-solverコード生成専用のベンチマークは公開されていないが、汎用コードベンチマークは、解決器の精度、効率、プロのFEMライブラリとの互換性を保証するなど、数値PDEソリューションのユニークな課題を完全には捉えていない。 PDEAgent-BenchはPDE-to-solverコード生成のための最初のマルチメトリック・マルチライブラリベンチマークである。 PDEAgent-Benchには6つの数学カテゴリと11のPDEファミリーに645のインスタンスがあり、共通FEMライブラリはDOLFINx、Firedrake、 deal.IIである。各インスタンスは、エージェントが対象とする問題仕様、所定の評価グリッド上の参照ソリューション、ケース固有の精度と実行時のターゲットを提供する。 PDEAgent-Benchは、生成したソルバが実行可能性、数値精度、計算効率チェックを順次パスしなければならない段階評価フレームワークを採用している。代表的なLCMやコードエージェントによる実験では、モデルはしばしば実行可能なコードを生成することができるが、精度と効率の要求が実行されればパスレートは大幅に低下する。以上の結果から, PDEAgent-Benchは数値的PDE解法を基礎とした再現性テストベッドを提供する。

論文の概要: PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation

関連論文リスト