Related papers: WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

URL: http://arxiv.org/abs/2308.09583v1
Date: Fri, 18 Aug 2023 14:23:21 GMT
Title: WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang
Abstract summary: We present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Our model even surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci, PaLM-1 and GPT-3 on MATH.
Score: 128.89645483139236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. WizardMath surpasses all other open-source LLMs by a substantial margin. Furthermore, our model even outperforms ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH. More details and model weights are public at https://github.com/nlpxucan/WizardLM and https://huggingface.co/WizardLM.

Related papers

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On [55.449818944278526]
We introduce the Skywork-Math model series, supervised fine-tuned (SFT) on common 7B language models. Skywork-Math 7B has achieved impressive accuracies of 51.2% on the competition-level MATH benchmark. We provide several practical takeaways to enhance math reasoning abilities in LLMs for both research and industry applications.
arXiv Detail & Related papers (2024-07-11T09:56:51Z)
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models [62.815222721144636]
We introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K. This novel approach significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5. Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark.
arXiv Detail & Related papers (2024-06-25T05:43:21Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models. It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z)
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning. We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z)
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
We open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning.
arXiv Detail & Related papers (2024-02-09T11:22:08Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models [91.66694225955872]
We propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.
arXiv Detail & Related papers (2023-09-21T17:45:42Z)
Mathematical Capabilities of ChatGPT [35.71603158908465]
We release two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics. We benchmark the models on a range of fine-grained performance metrics.
arXiv Detail & Related papers (2023-01-31T18:59:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.