Fugu-MT 論文翻訳(概要): Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

論文の概要: Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

arxiv url: http://arxiv.org/abs/2606.06523v1
Date: Tue, 02 Jun 2026 18:46:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-08 14:33:29.340041
Title: Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory
Title（参考訳）: Lean4Agent: エージェントワークフローと軌道の形式的モデリングと検証
Authors: Ruida Wang, Jerry Huang, Pengcheng Wang, Xuanqing Liu, Luyang Kong, Tong Zhang,
Abstract要約: 信頼性の高いマルチステップを実行するためのLLM(Large Language Models)の取得は、人工知能における中心的な課題となっている。この課題は、自然言語の曖昧さ(NL)が形式言語(FL)の発展を動機付けている数学における長年の問題を反映している。エージェントの振る舞いをモデル化し検証するための依存型FLであるLean4を使用した最初のフレームワークである**Lean4Agent*を提案する。
参考スコア（独自算出の注目度）: 17.95201244576703
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge in artificial intelligence. Despite recent advances in LLMs' agentic capabilities, most agent systems still lack formal methods for specifying, verifying, and debugging their workflow and execution trajectories. This challenge mirrors a long-standing problem in mathematics, where the ambiguity of natural languages (NLs) motivates the development of formal languages (FLs). Inspired by this paradigm, we propose **Lean4Agent**, to the best of our knowledge, the first framework that uses Lean4, a dependent-type FL to model and verify agent behavior. **Lean4Agent** launches **FormalAgentLib**, an extensible Lean4 library for formally modeling and verifying agent workflows' semantic consistency under explicit assumptions, and enabling localization of execution-time failures revealed by trajectories. Building on **FormalAgentLib**, we further develop **LeanEvolve**, which applies results in **FormalAgentLib** to revise workflows to enhance its capability. Extensive experiments on a hard problem subset of SWE-Bench-Verified and a subset of ELAIP-Bench across 5 leading LLMs indicate that the verification-passing workflows outperform the failing ones by an average of **11.94%**, and **LeanEvolve** further improves SWE performance by **7.47%** on average. Furthermore, **Lean4Agent** establishes a foundation for a new field of using expressive dependent-type FL to formally model and verify agent behavior.
Abstract（参考訳）: 信頼性の高い多段階ワークフローを実行するためのLLM(Large Language Models)の取得は、人工知能における中心的な課題となっている。 LLMのエージェント能力の最近の進歩にもかかわらず、ほとんどのエージェントシステムは、ワークフローと実行軌跡を指定、検証、デバッグするための正式なメソッドを欠いている。この課題は、自然言語の曖昧さ(NL)が形式言語(FL)の発展を動機付けている数学における長年の問題を反映している。エージェントの振る舞いをモデル化し検証するための依存型FLであるLean4を使用した最初のフレームワークである。これは、明示的な仮定の下でエージェントワークフローのセマンティックな一貫性を正式にモデル化し、検証し、トラジェクトリによって明らかにされる実行時障害のローカライズを可能にする拡張可能なLean4ライブラリである。ここでは、**FormalAgentLib* に基づいて、**LeanEvolve** をさらに開発し、**FormalAgentLib* で結果を適用して、ワークフローを改訂し、その機能を強化します。 SWE-Bench-Verified のハード問題部分集合と5つの主要な LLM にまたがる ELAIP-Bench の部分集合に関する広範な実験により、検証パスワークフローは、平均**11.94%** と***LeanEvolve** は、平均***7.47%* でSWE性能をさらに向上することを示している。さらに、**Lean4Agent**は、表現依存型FLを使用してエージェントの振る舞いを正式にモデル化し検証する新しい分野の基礎を確立する。

論文の概要: Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

関連論文リスト