Fugu-MT 論文翻訳(概要): CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean

論文の概要: CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean

arxiv url: http://arxiv.org/abs/2605.17255v1
Date: Sun, 17 May 2026 04:53:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.811373
Title: CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean
Title（参考訳）: CAM-Bench: リーンにおける計算数学と応用数学のベンチマーク
Authors: Wentao Long, Yunfei Zhang, Chenyi Li, Li Zhou, Chumin Sun, Zaiwen Wen,
Abstract要約: CAM-Benchは、計算および応用数学における1000のリーン証明目標のLean 4定理証明ベンチマークである。これらの問題は教科書の演習に適応しており、しばしばローカルに導入された定義、表記法、アルゴリズム、基礎的な結果に依存している。リーンコンパイルとセマンティックレビューを通じて、結果のフォーマルな問題を検証し、フォーマルな正当性とセマンティックなアライメントの両方を元のエクササイズで確認します。
参考スコア（独自算出の注目度）: 10.684401671916158
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Formal theorem-proving benchmarks enable mechanically verifiable evaluation of mathematical reasoning in large language models. However, existing benchmarks mainly focus on Olympiad-style problems and algebraic domains, leaving computational and applied mathematics underrepresented. We introduce CAM-Bench, a Lean 4 theorem-proving benchmark of 1,000 Lean proof targets in computational and applied mathematics, with coverage spanning optimization, numerical linear algebra, and numerical analysis. These problems are adapted from textbook exercises and often depend on locally introduced definitions, notation, algorithms, and elementary results. To construct CAM-Bench, we develop a dependency-recovery pipeline that reconstructs the local textbook context needed to state each problem faithfully. It then normalizes each problem into a standalone informal theorem and translates it into a Lean target. We validate the resulting formal problems through Lean compilation and semantic review, checking both formal correctness and semantic alignment with the original exercises. For each problem, we release the raw exercise, recovered context, normalized informal theorem, and final Lean target. CAM-Bench complements existing formal mathematics benchmarks by targeting applied mathematics problems that rely on textbook concepts and elementary theorems, many of which are not directly available as standard Mathlib4 lemmas. We evaluate widely used large language models and formalization agents on CAM-Bench, and analyze common failure modes in tracking local assumptions, applying elementary results, decomposing proofs, and maintaining long-horizon control in Lean.
Abstract（参考訳）: 形式的定理証明ベンチマークは、大きな言語モデルにおける数学的推論の機械的検証を可能にする。しかし、既存のベンチマークは主にオリンピアード形式の問題と代数的領域に焦点を当てており、計算や応用数学は不足している。 CAM-Benchは、計算および応用数学における1000のリーン証明対象のLean 4定理を実証するベンチマークであり、カバレッジは最適化、数値線形代数、数値解析にまたがる。これらの問題は教科書の演習に適応しており、しばしばローカルに導入された定義、表記法、アルゴリズム、基礎的な結果に依存している。 CAM-Benchを構築するために,各問題を忠実に記述するために必要な局所的な教科書コンテキストを再構築する依存性回復パイプラインを開発した。そして、各問題をスタンドアローンの非公式な定理に正規化し、それをリーンの目標に翻訳します。リーンコンパイルとセマンティックレビューを通じて、結果のフォーマルな問題を検証し、フォーマルな正当性とセマンティックなアライメントの両方を元のエクササイズで確認します。それぞれの問題に対して、生のエクササイズ、回復されたコンテキスト、正規化された非公式な定理、最終的なリーン目標をリリースします。 CAM-Benchは、教科書の概念や基本定理に依存する応用数学問題をターゲットにして、既存の形式数学のベンチマークを補完する。我々は,CAM-Bench上で広く使用されている大規模言語モデルと形式化エージェントを評価し,局所的な仮定の追跡,基本結果の適用,証明の分解,リーンにおける長期制御の維持などにおいて,一般的な障害モードを分析した。

論文の概要: CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean

関連論文リスト