Fugu-MT 論文翻訳(概要): FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

論文の概要: FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

arxiv url: http://arxiv.org/abs/2510.02335v1
Date: Fri, 26 Sep 2025 14:40:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.009306
Title: FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
Title（参考訳）: FormalML: 機械学習理論における形式的サブゴナル補完の評価ベンチマーク
Authors: Xiao-Wen Yang, Zihao Zhang, Jianuo Cao, Zhi Zhou, Zenan Li, Lan-Zhe Guo, Yuan Yao, Taolue Chen, Yu-Feng Li, Xiaoxing Ma,
Abstract要約: 大規模言語モデル (LLM) は、最近、形式定理の証明において顕著な進歩を見せている。しかし、数学者の実践的なアシスタントとして機能する能力は、複雑な証明の中で欠落したステップを埋めるものであり、まだ解明されていない。機械学習の基礎理論に基づいて構築された、リーン4ベンチマークであるFormalMLを紹介します。
参考スコア（独自算出の注目度）: 44.64175433092553
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have recently demonstrated remarkable progress in formal theorem proving. Yet their ability to serve as practical assistants for mathematicians, filling in missing steps within complex proofs, remains underexplored. We identify this challenge as the task of subgoal completion, where an LLM must discharge short but nontrivial proof obligations left unresolved in a human-provided sketch. To study this problem, we introduce FormalML, a Lean 4 benchmark built from foundational theories of machine learning. Using a translation tactic that converts procedural proofs into declarative form, we extract 4937 problems spanning optimization and probability inequalities, with varying levels of difficulty. FormalML is the first subgoal completion benchmark to combine premise retrieval and complex research-level contexts. Evaluation of state-of-the-art provers highlights persistent limitations in accuracy and efficiency, underscoring the need for more capable LLM-based theorem provers for effective subgoal completion,
Abstract（参考訳）: 大規模言語モデル (LLM) は、最近、形式定理の証明において顕著な進歩を見せている。しかし、数学者の実践的なアシスタントとして機能する能力は、複雑な証明の中で欠落したステップを埋めるものであり、まだ解明されていない。我々はこの課題を,人為的なスケッチに残されている未解決の証明義務を,LLMが短時間で退避しなければならないサブゴール完了の課題とみなす。この問題を解決するために,機械学習の基礎理論に基づいて構築された,リーン4ベンチマークであるFormalMLを紹介した。手続き的証明を宣言形式に変換する翻訳手法を用いて、最適化と確率不等式にまたがる4937の問題を、様々な難易度で抽出する。 FormalMLは、前提検索と複雑な研究レベルのコンテキストを組み合わせた最初のサブゴール補完ベンチマークである。最先端のプローバーの評価では、精度と効率の持続的な制限が強調され、より有能なLLMベースの定理プローバーが効果的なサブゴール完備のために必要であることを示す。

論文の概要: FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

関連論文リスト