Fugu-MT 論文翻訳(概要): Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning

論文の概要: Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning

arxiv url: http://arxiv.org/abs/2412.15184v1
Date: Thu, 19 Dec 2024 18:55:17 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-20 18:44:16.264275
Title: Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning
Title（参考訳）: 数理コパイロットのためのデータ:機械学習のための証明のより良い提示方法
Authors: Simon Frieder, Jonas Bayer, Katherine M. Collins, Julius Berner, Jacob Loader, András Juhász, Fabian Ruehle, Sean Welleck, Gabriel Poesia, Ryan-Rhys Griffiths, Adrian Weller, Anirudh Goyal, Thomas Lukasiewicz, Timothy Gowers,
Abstract要約: 我々は,大規模言語モデルの能力向上には,数学的データセットの設計におけるパラダイムシフトが必要であると論じる。 1949年にG. P'olyaが導入した「動機付き証明」の概念は、より良い証明学習信号を提供するデータセットの青写真として機能する。数学データセットに特化して設計されたアンケートでは、クリエーターにデータセットを含めるよう促します。
参考スコア（独自算出の注目度）: 85.635988711588
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The suite of datasets commonly used to train and evaluate the mathematical capabilities of AI-based mathematical copilots (primarily large language models) exhibit several shortcomings. These limitations include a restricted scope of mathematical complexity, typically not exceeding lower undergraduate-level mathematics, binary rating protocols and other issues, which makes comprehensive proof-based evaluation suites difficult. We systematically explore these limitations and contend that enhancing the capabilities of large language models, or any forthcoming advancements in AI-based mathematical assistants (copilots or "thought partners"), necessitates a paradigm shift in the design of mathematical datasets and the evaluation criteria of mathematical ability: It is necessary to move away from result-based datasets (theorem statement to theorem proof) and convert the rich facets of mathematical research practice to data LLMs can train on. Examples of these are mathematical workflows (sequences of atomic, potentially subfield-dependent tasks that are often performed when creating new mathematics), which are an important part of the proof-discovery process. Additionally, we advocate for mathematical dataset developers to consider the concept of "motivated proof", introduced by G. P\'olya in 1949, which can serve as a blueprint for datasets that offer a better proof learning signal, alleviating some of the mentioned limitations. Lastly, we introduce math datasheets for datasets, extending the general, dataset-agnostic variants of datasheets: We provide a questionnaire designed specifically for math datasets that we urge dataset creators to include with their datasets. This will make creators aware of potential limitations of their datasets while at the same time making it easy for readers to assess it from the point of view of training and evaluating mathematical copilots.
Abstract（参考訳）: AIベースの数学的コピロ(主に大きな言語モデル)の数学的能力のトレーニングと評価に一般的に使用されるデータセット群には、いくつかの欠点がある。これらの制限には、数学の複雑さの制限範囲が含まれており、通常は低学年レベルの数学、バイナリレーティングプロトコル、その他の問題を超えず、包括的な証明ベースの評価スイートを困難にしている。我々はこれらの制限を体系的に探求し、AIベースの数学的アシスタント(コパイロットまたは「思想的パートナー」)のさらなる進歩は、数学的データセットの設計と数学的能力の評価基準におけるパラダイムシフトを必要とすることを主張する。これらの例としては、証明発見過程の重要な部分である数学的ワークフロー(原子的、潜在的にサブフィールド依存的なタスクが、新しい数学を作成する際にしばしば実行される)がある。さらに、1949年にG. P\'olyaによって導入された「動機付き証明」の概念は、より優れた証明学習信号を提供するデータセットの青写真として機能し、いくつかの制限を緩和する。最後に、データセット用の数学データシートを導入し、データセットに依存しない一般的なデータシートを拡張します。これにより、クリエーターはデータセットの潜在的な制限を認識でき、同時に、学習や数学的コピロの評価の観点から、読者が容易にデータセットを評価することができる。

関連論文リスト

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images [69.93976232543066]
本稿では,コード駆動型Chain-of-ThoughtパラダイムであるCodePlot-CoTを提案する。そこで我々はまず,視覚推論を用いた数学問題のための大規模バイリンガルデータセットとベンチマークであるMath-VRを構築した。我々のモデルは,提案したコード駆動推論パラダイムの有効性を十分に検証し,ベースモデルよりも最大で21%向上する。
論文参考訳（メタデータ） (2025-10-13T17:59:55Z)
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics [21.453837660747844]
大規模言語モデル(LLM)における数学的推論を評価するための既存のベンチマークは、主に競合問題、公式な証明、人工的な問題に依存している。論文や数理フォーラムから直接派生した新しいベンチマークであるRealMathを導入し,実数理タスクにおけるLLMの能力を評価する。
論文参考訳（メタデータ） (2025-05-18T23:32:46Z)
Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics [4.229995708813431]
私たちはAlgebraic Combinatorics dataset Repository (ACD Repo)という新しいデータセットのコレクションを紹介します。各データセットには、オープンな研究レベルの質問と、サンプルの大規模なコレクションが含まれている。機械学習モデルを適用する方法の異なる9つのデータセットについて説明する。
論文参考訳（メタデータ） (2025-03-09T00:11:40Z)
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task [49.355810887265925]
数学的推論ステップ拡張のための新しいフレームワークであるMathFimerを紹介する。我々は、慎重にキュレートしたNuminaMath-FIMデータセットに基づいて、特殊モデルMathFimer-7Bを開発した。次に、これらのモデルを適用して、解鎖に詳細な中間ステップを挿入することで、既存の数学的推論データセットを強化する。
論文参考訳（メタデータ） (2025-02-17T11:22:24Z)
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code [38.127313175508746]
本稿では, 継続事前学習のための推論ステップを伴って, 数学的コードを生成する新しい手法を提案する。私たちのアプローチは、高品質な数学的継続事前学習データセットの構築から始まります。生成されたコードを推論ステップ毎に適用すると、ペアの自然言語推論ステップとその対応するコードからなるデータが得られる。
論文参考訳（メタデータ） (2024-10-10T17:58:40Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
本稿では,コードに基づく批判モデルを用いて,質問コードデータ構築,品質管理,補完的評価などのステップをガイドする新しいパラダイムを提案する。英語と中国語におけるドメイン内ベンチマークとドメイン外ベンチマークの両方の実験は、提案したパラダイムの有効性を実証している。
論文参考訳（メタデータ） (2024-08-28T06:33:03Z)
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning [13.728595670907136]
InfinityMATHは、プログラム数学的推論のためのスケーラブルな命令チューニングデータセットである。オープンソースの言語とLlama2やCodeLlamaといったコードモデルによる微調整実験は、InfinityMATHの実用的メリットを実証している。
論文参考訳（メタデータ） (2024-08-09T08:18:20Z)
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions [47.83142414018448]
算術的推論とコード生成という,2つの一般的な推論タスクに注目します。 i) 数学やコーディング問題に対する摂動の一般的なオントロジー, (ii) 摂動を応用するための半自動手法, (iii) 2つのデータセットを紹介する。混乱した質問に対して、すべてのモデルで大幅なパフォーマンス低下を示します。
論文参考訳（メタデータ） (2024-01-17T18:13:07Z)
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline [12.186691561822256]
我々は,大規模言語モデル(LLM)の本質的な性質が,数学的推論のモデル化における課題を提起していると仮定する。本稿では,Pythonコードインタプリタを利用した新しい数学データセットを提案する。本稿では,数学固有のLLMの微調整のための仮的かつ容易に複製可能なプロトコルを提案する。
論文参考訳（メタデータ） (2024-01-16T08:08:01Z)
Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics [0.0]
人間-AIチャット以外にも、大規模言語モデル(LLM)はプログラミング、アルゴリズム発見、定理証明に現れている。本研究は「ムーアの数学法則」の新たなエントリとして数学エージェントと数学的埋め込みを紹介する。プロジェクトは、情報システム生物学の老朽化問題に対処するために、数学エージェントと数学的埋め込みを使用することを目的としている。
論文参考訳（メタデータ） (2023-07-04T20:16:32Z)
Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
大型言語モデル(LLM)と対話し,評価するためのプロトタイププラットフォームであるCheckMateを紹介した。我々はCheckMateと共同で3つの言語モデル(InstructGPT, ChatGPT, GPT-4)を、学部レベルの数学の証明支援として評価する研究を行った。我々は、人間の行動の分類を導き、概して肯定的な相関にもかかわらず、正しさと知覚的有用性の間に顕著な相違点があることを明らかにする。
論文参考訳（メタデータ） (2023-06-02T17:12:25Z)
A Survey of Deep Learning for Mathematical Reasoning [71.88150173381153]
我々は過去10年間の数学的推論とディープラーニングの交差点における重要なタスク、データセット、方法についてレビューする。大規模ニューラルネットワークモデルの最近の進歩は、新しいベンチマークと、数学的推論にディープラーニングを使用する機会を開放している。
論文参考訳（メタデータ） (2022-12-20T18:46:16Z)
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge [101.27154181792567]
ビッグデータの一般化能力を近似した小さなデータについて学ぶことは、AIの究極の目的の1つである。この調査はPACフレームワークの下でのアクティブサンプリング理論に従い、小さなデータにおける学習の一般化誤差とラベルの複雑さを分析した。効率的な小さなデータ表現の恩恵を受けるかもしれない複数のデータアプリケーションについて調査する。
論文参考訳（メタデータ） (2022-07-29T02:34:19Z)
IsarStep: a Benchmark for High-level Mathematical Reasoning [20.96474618260995]
本稿では,高レベルな数学的推論のためのベンチマークを提案し,ニューラルシークエンス・ツー・シーケンスモデルの推論能力について検討する。我々は、人間の専門家が定理証明器で記述した最大の証明のリポジトリから、非合成データセットを構築した。
論文参考訳（メタデータ） (2020-06-13T21:09:23Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。