WizardMath: Empowering Mathematical Reasoning for Large Language Models
via Reinforced Evol-Instruct
- URL: http://arxiv.org/abs/2308.09583v1
- Date: Fri, 18 Aug 2023 14:23:21 GMT
- Title: WizardMath: Empowering Mathematical Reasoning for Large Language Models
via Reinforced Evol-Instruct
- Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang
Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang
- Abstract summary: We present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.
Our model even surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci, PaLM-1 and GPT-3 on MATH.
- Score: 128.89645483139236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs), such as GPT-4, have shown remarkable
performance in natural language processing (NLP) tasks, including challenging
mathematical reasoning. However, most existing open-source models are only
pre-trained on large-scale internet data and without math-related optimization.
In this paper, we present WizardMath, which enhances the mathematical reasoning
abilities of Llama-2, by applying our proposed Reinforcement Learning from
Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive
experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we
reveal the extraordinary capabilities of our model. WizardMath surpasses all
other open-source LLMs by a substantial margin. Furthermore, our model even
outperforms ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k,
simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH. More
details and model weights are public at https://github.com/nlpxucan/WizardLM
and https://huggingface.co/WizardLM.
Related papers
- Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On [55.449818944278526]
We introduce the Skywork-Math model series, supervised fine-tuned (SFT) on common 7B language models.
Skywork-Math 7B has achieved impressive accuracies of 51.2% on the competition-level MATH benchmark.
We provide several practical takeaways to enhance math reasoning abilities in LLMs for both research and industry applications.
arXiv Detail & Related papers (2024-07-11T09:56:51Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving.
However, their proficiency in solving mathematical problems remains inadequate.
We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z) - MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning.
We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z) - InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
We open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2.
We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format.
Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning.
arXiv Detail & Related papers (2024-02-09T11:22:08Z) - MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations.
We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions.
This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.