Fugu-MT 論文翻訳(概要): MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics

論文の概要: MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics

arxiv url: http://arxiv.org/abs/2602.02561v1
Date: Fri, 30 Jan 2026 15:24:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:14.935307
Title: MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics
Title（参考訳）: MathlibLemma: Folklore Lemma の生成とフォーマル数学のベンチマーク
Authors: Xinyu Liu, Zixuan Xie, Amir Moeini, Claire Chen, Shuze Daniel Liu, Yu Meng, Aidong Zhang, Shangtong Zhang,
Abstract要約: 数学的民俗補題の発見と形式化を自動化するLLMベースのマルチエージェントシステムであるMathlibLemmaを紹介する。さらに、幅広い数学的領域にまたがる4,028の型チェックリーンステートメントスイートであるMathlibLemmaベンチマークを構築します。
参考スコア（独自算出の注目度）: 51.65118519899584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While the ecosystem of Lean and Mathlib has enjoyed celebrated success in formal mathematical reasoning with the help of large language models (LLMs), the absence of many folklore lemmas in Mathlib remains a persistent barrier that limits Lean's usability as an everyday tool for mathematicians like LaTeX or Maple. To address this, we introduce MathlibLemma, the first LLM-based multi-agent system to automate the discovery and formalization of mathematical folklore lemmas. This framework constitutes our primary contribution, proactively mining the missing connective tissue of mathematics. Its efficacy is demonstrated by the production of a verified library of folklore lemmas, a subset of which has already been formally merged into the latest build of Mathlib, thereby validating the system's real-world utility and alignment with expert standards. Leveraging this pipeline, we further construct the MathlibLemma benchmark, a suite of 4,028 type-checked Lean statements spanning a broad range of mathematical domains. By transforming the role of LLMs from passive consumers to active contributors, this work establishes a constructive methodology for the self-evolution of formal mathematical libraries.
Abstract（参考訳）: リーンとマドリブのエコシステムは、大きな言語モデル(LLM)の助けを借りて、フォーマルな数学的推論で成功をおさめた。そこで本研究では,LLMを用いた初のマルチエージェントシステムであるMathlibLemmaを紹介し,数学的民俗補題の発見と形式化を自動化する。この枠組みは、数学の欠落した結合組織を積極的にマイニングすることで、我々の主要な貢献を構成する。その効果は、民間伝承のレムマの検証ライブラリーの作成によって実証され、そのサブセットは、すでにMathlibの最新のビルドにマージされており、それによってシステムの実世界の実用性と専門家の基準との整合性を検証している。このパイプラインを活用することで、幅広い数学的領域にまたがる4,028の型チェックリーンステートメントスイートであるMathlibLemmaベンチマークをさらに構築します。 LLMの役割をパッシブ・コンシューマからアクティブ・コントリビュータに転換することにより、形式的な数学的ライブラリの自己進化のための構築的方法論を確立する。

関連論文リスト

Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning [27.562284768743694]
大規模言語モデル(LLM)は、証明システム内で証明ステップを生成することによって、数学的定理を正式に証明することができる。本稿では,LLMが学習した数学的直観と,記号的手法によって符号化された領域固有の洞察を相乗化する,ニューロシンボリック・戦術生成器を提案する。複数の数学コンペティションから161の挑戦的不等式を評価する。
論文参考訳（メタデータ） (2025-02-19T15:54:21Z)
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs [75.95179490687018]
証明生成のための数学的大規模言語モデルを活用することは、LLM研究の基本的なトピックである。現状のLCMが証明できる能力は、学習中に関連する証明プロセスに遭遇したかどうかに大きく依存していると論じる。人間の数学教育で一般的に用いられる「反例による防御」の教育的手法に触発されて,我々の研究は,反例を通して数学的推論と証明を行うLLMの能力を高めることを目的としている。
論文参考訳（メタデータ） (2025-02-12T02:01:10Z)
LemmaHead: RAG Assisted Proof Generation Using Large Language Models [0.0]
我々は、関連する数学的文脈でモデルにクエリを補足する知識ベースであるLemmaHeadを開発した。数学的推論におけるモデルの性能を測定するため、我々のテストパラダイムは自動定理証明の課題に焦点を当てている。
論文参考訳（メタデータ） (2025-01-27T05:46:06Z)
Large Language Models for Mathematical Analysis [3.7325315394927023]
この研究は、数学的推論における重要なギャップに対処し、信頼できるAIの進歩に寄与する。 DEMI-MathAnalysisデータセットを開発した。また,LLMの問題解決能力を高めるためのガイドフレームワークも設計した。
論文参考訳（メタデータ） (2024-12-28T20:37:55Z)
LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
フォーマルな定理証明のための新しい生涯学習フレームワークであるLeanAgentを紹介する。 LeanAgentは継続的に一般化し、拡張可能な数学的知識を改善します。これは23のリーンリポジトリにわたる155の定理の正式な証明を生成する。
論文参考訳（メタデータ） (2024-10-08T17:11:24Z)
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBenchは、大規模言語モデルの数学的能力を厳格に評価する新しいベンチマークである。 MathBenchは幅広い数学の分野にまたがっており、理論的な理解と実践的な問題解決のスキルの両方を詳細に評価している。
論文参考訳（メタデータ） (2024-05-20T17:52:29Z)
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models [67.32868432113587]
本稿では,Large Language Models (LLMs) の概念的数学的推論を評価するための詳細なベンチマークであるConceptMathを紹介する。一般的な数学的推論を平均精度で評価する従来のベンチマークとは異なり、ConceptMathは数学の問題を数学的概念の階層の下に体系的に整理する。
論文参考訳（メタデータ） (2024-02-22T16:06:49Z)
Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
この研究は、自動定理プロバーの人間に対する大きな制限が言語モデルから生成することで対処できる可能性によって動機づけられている。本稿ではメタマス形式化言語のための自動証明と証明アシスタント GPT-f を提案し,その性能を解析する。
論文参考訳（メタデータ） (2020-09-07T19:50:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。