Related papers: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

URL: http://arxiv.org/abs/2402.03300v3
Date: Sat, 27 Apr 2024 15:25:53 GMT
Title: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Authors: Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo,
Abstract summary: We introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark.
Score: 33.5778998066089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Related papers

Reliable Fine-Grained Evaluation of Natural Language Math Proofs [30.992321135182905]
We propose a systematic methodology for developing evaluators that assign fine-grained scores on a 0-7 scale to model-generated math proofs.<n>We introduce ProofBench, the first expert-annotated dataset of fine-grained proof ratings, spanning 145 problems from six major math competitions.<n>Our analysis delivers ProofGrader, an evaluator that combines a strong reasoning backbone LM, rich context from reference solutions and marking schemes, and a simple ensembling method.
arXiv Detail & Related papers (2025-10-14T02:59:07Z)
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning [12.343823629674368]
We present REAL-Prover, a new open-source stepwise theorem prover for Lean 4.<n>Our prover notably boosts performance on solving college-level mathematics problems.<n>In experiments, our prover using only supervised fine-tune theorem achieves competitive results with a 23.7% success rate.
arXiv Detail & Related papers (2025-05-27T01:26:11Z)
Preference Optimization for Reasoning with Pseudo Feedback [100.62603571434167]
We introduce a novel approach to generate pseudo feedback for reasoning tasks by framing the labeling of solutions as an evaluation against associated test cases. We conduct experiments on both mathematical reasoning and coding tasks using pseudo feedback for preference optimization, and observe improvements across both tasks.
arXiv Detail & Related papers (2024-11-25T12:44:02Z)
Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance. Existing direct preference learning algorithms are originally designed for the single-turn chat task. We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z)
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning [24.68321102981711]
We introduce a series of large language models (LLMs) that employ the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath. DotaMath models tackle complex mathematical tasks by decomposing them into simpler logical subtasks, leveraging code to solve these subtasks, and engaging in self-reflection and correction. We train a series of base LLMs using imitation learning on DotaMathQA, resulting in DotaMath models that achieve remarkable performance compared to open-source LLMs.
arXiv Detail & Related papers (2024-07-04T17:39:16Z)
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning [11.426127461122908]
This work includes new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them. Open Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities. We propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code.
arXiv Detail & Related papers (2024-05-13T08:32:19Z)
Advancing LLM Reasoning Generalists with Preference Trees [119.57169648859707]
We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning. Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks.
arXiv Detail & Related papers (2024-04-02T16:25:30Z)
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning [110.80663974060624]
Key-Point-Driven Data Synthesis (KPDDS) is a novel data synthesis framework that synthesizes question-answer pairs. KPDDS ensures the generation of novel questions with rigorous quality control and substantial scalability. We present KPMath, an extensive synthetic dataset tailored for mathematical reasoning, comprising over 800K question-answer pairs.
arXiv Detail & Related papers (2024-03-04T18:58:30Z)
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline [12.186691561822256]
We postulate that the inherent nature of large language models (LLMs) presents challenges in modeling mathematical reasoning. This paper introduces a novel math dataset, enhanced with a capability to utilize a Python code interpreter. We propose a tentative, easily replicable protocol for the fine-tuning of math-specific LLMs.
arXiv Detail & Related papers (2024-01-16T08:08:01Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [130.37945867605302]
We present WizardMath, which enhances the mathematical CoT reasoning abilities of large language models (LLMs) without using external python tools. Remarkably, WizardMath-Mistral 7B surpasses top-tier open-source LLMs by a substantial margin with higher data efficiency. Our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance.
arXiv Detail & Related papers (2023-08-18T14:23:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.