WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
- URL: http://arxiv.org/abs/2308.09583v2
- Date: Wed, 01 Jan 2025 13:02:05 GMT
- Title: WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
- Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Yansong Tang, Dongmei Zhang,
- Abstract summary: We present WizardMath, which enhances the mathematical CoT reasoning abilities of large language models (LLMs) without using external python tools.
Remarkably, WizardMath-Mistral 7B surpasses top-tier open-source LLMs by a substantial margin with higher data efficiency.
Our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance.
- Score: 130.37945867605302
- License:
- Abstract: Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical CoT reasoning abilities of LLMs without using external python tools, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. Remarkably, WizardMath-Mistral 7B surpasses top-tier open-source LLMs by a substantial margin with higher data efficiency. Furthermore, WizardMath 70B even outperforms GPT-3.5-Turbo, Claude 2, Gemini Pro and GPT-4-early-version. Additionally, our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance. For more details refer to https://github.com/nlpxucan/WizardLM
Related papers
- Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On [55.449818944278526]
We introduce the Skywork-Math model series, supervised fine-tuned (SFT) on common 7B language models.
Skywork-Math 7B has achieved impressive accuracies of 51.2% on the competition-level MATH benchmark.
We provide several practical takeaways to enhance math reasoning abilities in LLMs for both research and industry applications.
arXiv Detail & Related papers (2024-07-11T09:56:51Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving.
However, their proficiency in solving mathematical problems remains inadequate.
We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z) - MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning.
We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z) - InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
We open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2.
We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format.
Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning.
arXiv Detail & Related papers (2024-02-09T11:22:08Z) - MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations.
We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions.
This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.