Fugu-MT 論文翻訳(概要): Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4

論文の概要: Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4

arxiv url: http://arxiv.org/abs/2510.26094v1
Date: Thu, 30 Oct 2025 03:09:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.637806
Title: Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4
Title（参考訳）: Lean4Physics: Lean4におけるカレッジレベルの物理学のための総合的な推論フレームワーク
Authors: Yuxin Li, Minghao Liu, Ruida Wang, Wenzhao Ji, Zhitao He, Rui Pan, Junming Huang, Tong Zhang, Yi R. Fung,
Abstract要約: 大学レベルの物理学問題に対する包括的な推論フレームワークである**Lean4PHYS*をLean4で紹介する。 **Lean4PHYS*には、Lean4の正式な物理推論のためのカレッジレベルのベンチマークである*LeanPhysBench*が含まれている。コミュニティ主導のレポジトリである*PhysLib*も紹介する。
参考スコア（独自算出の注目度）: 17.8194025216225
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present **Lean4PHYS**, a comprehensive reasoning framework for college-level physics problems in Lean4. **Lean4PHYS** includes *LeanPhysBench*, a college-level benchmark for formal physics reasoning in Lean4, which contains 200 hand-crafted and peer-reviewed statements derived from university textbooks and physics competition problems. To establish a solid foundation for formal reasoning in physics, we also introduce *PhysLib*, a community-driven repository containing fundamental unit systems and theorems essential for formal physics reasoning. Based on the benchmark and Lean4 repository we composed in **Lean4PHYS**, we report baseline results using major expert Math Lean4 provers and state-of-the-art closed-source models, with the best performance of DeepSeek-Prover-V2-7B achieving only 16% and Claude-Sonnet-4 achieving 35%. We also conduct a detailed analysis showing that our *PhysLib* can achieve an average improvement of 11.75% in model performance. This demonstrates the challenging nature of our *LeanPhysBench* and the effectiveness of *PhysLib*. To the best of our knowledge, this is the first study to provide a physics benchmark in Lean4.
Abstract（参考訳）: 大学レベルの物理学問題に対する包括的な推論フレームワークである**Lean4PHYS*をLean4で紹介する。関連スポンサーコンテンツ * *Lean4PHYS* には、Lean4の正式な物理学推論のための大学レベルのベンチマークである *LeanPhysBench* が含まれている。物理学における形式的推論のための確固たる基盤を確立するために、基礎単位系と公式な物理学的推論に不可欠な定理を含むコミュニティ主導のリポジトリである *PhysLib* も導入する。我々は、**Lean4PHYS*で作成したベンチマークとLean4リポジトリに基づいて、主要なエキスパートであるMath Lean4 Proversと最先端のクローズドソースモデルを使用してベースライン結果を報告し、DeepSeek-Prover-V2-7Bのパフォーマンスはわずか16%、Claude-Sonnet-4は35%に過ぎなかった。また、我々の*PhysLib*はモデル性能の11.75%の平均的な改善を達成できることを示す詳細な分析も行います。これは、我々の*LeanPhysBench*と*PhysLib*の有効性の難しさを示しています。私たちの知る限りでは、Lean4で物理学ベンチマークを提供する最初の研究です。

論文の概要: Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4

関連論文リスト