Fugu-MT 論文翻訳(概要): FormalGrad: Integrating Formal Methods with Gradient-Based LLM Refinement

論文の概要: FormalGrad: Integrating Formal Methods with Gradient-Based LLM Refinement

arxiv url: http://arxiv.org/abs/2508.10059v1
Date: Tue, 12 Aug 2025 22:03:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-15 22:24:48.059671
Title: FormalGrad: Integrating Formal Methods with Gradient-Based LLM Refinement
Title（参考訳）: FormalGrad: グラディエント型LCMリファインメントによる形式的手法の統合
Authors: Yueke Zhang, Yifan Zhang, Kevin Leach, Yu Huang,
Abstract要約: FormalGradは、形式的なメソッドを直接反復生成ループに統合する、原則化されたフレームワークを導入している。コードを微分可能な変数として扱い、構造化されたフィードバックと形式的な制約をテキストの擬似階調に変換する。我々は,HumanEval,HumanEval+,LiveCodeBenchベンチマーク上でFormalGradを評価する。
参考スコア（独自算出の注目度）: 8.574686422653345
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, they often produce solutions that lack guarantees of correctness, robustness, and efficiency. The limitation is acute in domains requiring strict constraints. FormalGrad introduces a principled framework that integrates formal methods directly into an iterative LLM-based generation loop. It uniquely treats code as a differentiable variable, converting structured feedback and formal constraints into a textual pseudo-gradient. This gradient guides the model to iteratively refine solutions, ensuring they are not only functional but also robust and formally justified. We evaluate FormalGrad on the HumanEval, HumanEval+, and LiveCodeBench benchmarks. Our implementation outperforms strong baselines, achieving an absolute improvement of up to 27% on HumanEval and a 41% relative improvement on the challenging LiveCodeBench V6. FormalGrad generates formally justified code that is robust and efficient, paving the way for reliable AI-assisted software development in high-stakes applications.
Abstract（参考訳）: 大規模言語モデル(LLM)はコード生成において顕著な能力を示してきたが、正確性、堅牢性、効率性の保証を欠いたソリューションをしばしば生み出す。この制限は、厳密な制約を必要とする領域において急性である。 FormalGradは、形式的なメソッドを直接反復的なLCMベースの生成ループに統合する、原則化されたフレームワークを導入している。コードを微分可能な変数として一意に扱い、構造化されたフィードバックと形式的な制約をテキストの擬似階調に変換する。この勾配は、モデルを反復的に洗練された解へと導き、それらが機能的だけでなく、頑健で正式に正当化されることを保証する。我々は,HumanEval,HumanEval+,LiveCodeBenchベンチマーク上でFormalGradを評価する。私たちの実装は、HumanEvalで最大27%の改善を実現し、挑戦的なLiveCodeBench V6で41%の相対的な改善を実現しています。 FormalGradは、堅牢で効率的な、正式に正当化されたコードを生成する。

論文の概要: FormalGrad: Integrating Formal Methods with Gradient-Based LLM Refinement

関連論文リスト