Fugu-MT 論文翻訳(概要): Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

論文の概要: Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

arxiv url: http://arxiv.org/abs/2502.17387v1
Date: Mon, 24 Feb 2025 18:14:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-25 22:36:56.713006
Title: Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Title（参考訳）: Big-Math: 言語モデルにおける強化学習のための大規模かつ高品質な数学データセット
Authors: Alon Albalak, Duy Phung, Nathan Lile, Rafael Rafailov, Kanishk Gandhi, Louis Castricato, Anikait Singh, Chase Blagden, Violet Xiang, Dakota Mahan, Nick Haber,
Abstract要約: Big-Mathは、25万以上の高品質な数学の質問と、検証可能な回答のデータセットです。強化学習(RL)のためのBig-Math
参考スコア（独自算出の注目度）: 11.706309334631985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Increasing interest in reasoning models has led math to become a prominent testing ground for algorithmic and methodological improvements. However, existing open math datasets either contain a small collection of high-quality, human-written problems or a large corpus of machine-generated problems of uncertain quality, forcing researchers to choose between quality and quantity. In this work, we present Big-Math, a dataset of over 250,000 high-quality math questions with verifiable answers, purposefully made for reinforcement learning (RL). To create Big-Math, we rigorously filter, clean, and curate openly available datasets, extracting questions that satisfy our three desiderata: (1) problems with uniquely verifiable solutions, (2) problems that are open-ended, (3) and problems with a closed-form solution. To ensure the quality of Big-Math, we manually verify each step in our filtering process. Based on the findings from our filtering process, we introduce 47,000 new questions with verified answers, Big-Math-Reformulated: closed-ended questions (i.e. multiple choice questions) that have been reformulated as open-ended questions through a systematic reformulation algorithm. Compared to the most commonly used existing open-source datasets for math reasoning, GSM8k and MATH, Big-Math is an order of magnitude larger, while our rigorous filtering ensures that we maintain the questions most suitable for RL. We also provide a rigorous analysis of the dataset, finding that Big-Math contains a high degree of diversity across problem domains, and incorporates a wide range of problem difficulties, enabling a wide range of downstream uses for models of varying capabilities and training requirements. By bridging the gap between data quality and quantity, Big-Math establish a robust foundation for advancing reasoning in LLMs.
Abstract（参考訳）: 推論モデルへの関心が高まり、数学はアルゴリズムと方法論の改善のための顕著な試験場となった。しかし、既存のオープンな数学データセットには、高品質で人書きの問題の小さなコレクションが含まれているか、不確実な品質のマシン生成問題の大きなコーパスを含んでいるため、研究者は品質と量を選択せざるを得ない。本稿では,強化学習(RL)を目的とする,25万以上の高品質な数学質問のデータセットであるBig-Mathを紹介する。 Big-Mathを作成するには、オープンに利用可能なデータセットを厳格にフィルタリング、クリーン化し、キュレートし、3つのデシダータを満たす質問を抽出する。 Big-Mathの品質を保証するため、フィルタリングプロセスの各ステップを手作業で検証します。フィルタリングプロセスから得られた知見に基づき, 体系的再構成アルゴリズムを用いて, オープンエンド質問として再検討された, 47,000 件の新しい回答, ビッグマス・コンフォーメーション(Big-Math-Reformulated): クローズドエンド質問(複数選択質問)を紹介した。数学推論や GSM8k や MATH などの既存のオープンソースデータセットと比較すると,Big-Math は桁違いに大きく,厳密なフィルタリングによって RL に最も適した質問を確実に維持できる。我々はまた、データセットの厳密な分析を行い、Big-Mathには問題領域にまたがる高度な多様性が含まれており、幅広い問題障害が組み込まれており、様々な能力と訓練要件のモデルに対する幅広いダウンストリームの使用を可能にしている。データ品質と量の間のギャップを埋めることによって、Big-MathはLLMにおける推論を進めるための堅牢な基盤を確立します。

関連論文リスト

Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning [66.43194385702297]
大規模言語モデル(LLM)は、特に強化学習(RL)を通じて強化された場合、強力な推論能力を示している。 NEMOTRON-CROSSTHINKは、多領域コーパスを体系的に組み込んだフレームワークであり、合成および実世界の問合せ対を含む。
論文参考訳（メタデータ） (2025-04-15T21:37:13Z)
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning [95.31714779585272]
DeepMath-103Kは、約103Kの数学的問題からなる新しい大規模データセットである。各問題は、ルールベースのRLを可能にする検証可能な最終回答を含む。我々は、DeepMath-103Kでトレーニングされたモデルが、挑戦的な数学的ベンチマークにおいて大幅に改善されることを実証した。
論文参考訳（メタデータ） (2025-04-15T17:59:51Z)
MegaMath: Pushing the Limits of Open Math Corpora [44.148011362359036]
MegaMathは、多種多様な数学に焦点を当てたソースからキュレートされたオープンデータセットである。 MegaMathは、既存のオープン数学事前トレーニングデータセットの中で、最大で最高品質の371Bトークンを提供する。
論文参考訳（メタデータ） (2025-04-03T17:52:07Z)
ControlMath: Controllable Data Generation Promotes Math Generalist Models [38.0858432336873]
方程式生成モジュールと2つの LLM ベースのエージェントを含む反復的手法である ControlMath を提案する。モジュールは多種多様な方程式を生成し、それを問題職人のエージェントが算術語問題に変換する。 ControlMathQAは190kの数学語問題を含む。
論文参考訳（メタデータ） (2024-09-20T03:58:26Z)
AI-Assisted Generation of Difficult Math Questions [78.7547836422727]
現在の訓練は、数学的推論をコア能力として位置づけている。多様で挑戦的な数学の質問には、控えめな需要がある。本稿では,LLMの強みとHuman-in-the-loopアプローチを組み合わせた設計枠組みを提案する。
論文参考訳（メタデータ） (2024-07-30T17:55:36Z)
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data [20.31528845718877]
大規模言語モデル(LLM)は、非常に高度な自然言語理解を持ち、強力な問題解決能力を示した。本稿では,新たに開発された"MathOdyssey"データセットを用いて,LLMの数学的問題解決能力について検討する。
論文参考訳（メタデータ） (2024-06-26T13:02:35Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStarは、大言語モデルの純粋に推論に基づく探索手法である。推論タスクを探索問題として定式化し、最適な推論経路を特定するための2つの探索アイデアを提案する。 Llama-2-13BやMistral-7Bのようなオープンソースモデルの推論能力を大幅に向上させ、GPT-3.5やGrok-1に匹敵する性能を実現している。
論文参考訳（メタデータ） (2024-05-25T15:07:33Z)
Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks [34.09857430966818]
我々は,11番目と12番目の標準数学 NCERT 教科書から得られた数学データセット "MathQuest" を紹介する。 LLaMA-2, WizardMath, MAmmoTHの3つの大きな言語モデルを用いた微調整実験を行った。この3つのモデルのうち,MAmmoTH-13Bが最も熟練したモデルとして登場し,提示された数理問題の解法において,最高レベルの能力を達成した。
論文参考訳（メタデータ） (2024-04-19T08:45:42Z)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
大規模言語モデル(LLM)は問題解決において顕著な能力を示した。しかし、数学的な問題を解く能力は依然として不十分である。高品質な数学的推論データを作成するためのシンプルでスケーラブルな方法であるMathScaleを提案する。
論文参考訳（メタデータ） (2024-03-05T11:42:59Z)
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions [47.83142414018448]
算術的推論とコード生成という,2つの一般的な推論タスクに注目します。 i) 数学やコーディング問題に対する摂動の一般的なオントロジー, (ii) 摂動を応用するための半自動手法, (iii) 2つのデータセットを紹介する。混乱した質問に対して、すべてのモデルで大幅なパフォーマンス低下を示します。
論文参考訳（メタデータ） (2024-01-17T18:13:07Z)
Measuring Mathematical Problem Solving With the MATH Dataset [55.4376028963537]
12,500の競合数学問題のデータセットであるMATHを紹介する。各問題には、答えの導出と説明を生成するためのモデルを教えるために使用できる完全なステップバイステップソリューションがあります。また、モデルに数学の基礎を教えるための補助的事前学習データセットも提供します。
論文参考訳（メタデータ） (2021-03-05T18:59:39Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。