Related papers: PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation

PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation

URL: http://arxiv.org/abs/2410.01504v1
Date: Wed, 2 Oct 2024 12:57:12 GMT
Title: PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation
Authors: Jing Luo, Run Luo, Longze Chen, Liang Zhu, Chang Ao, Jiaming Li, Yukun Chen, Xin Cheng, Wen Yang, Jiayuan Su, Chengming Li, Min Yang,
Abstract summary: We introduce PersonaMathQA, a dataset derived from MATH and GSM8K, on which we train the PersonaMath models. Our dataset contains only 70.3K data points-merely 17.8% of MetaMathQA and 27% of MathInstruct. We open-source the PersonaMathQA dataset, PersonaMath models, and our code for public usage.
Score: 24.13606388901431
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While closed-source Large Language Models (LLMs) demonstrate strong mathematical problem-solving abilities, open-source models continue to struggle with such tasks. To bridge this gap, we propose a data augmentation approach and introduce PersonaMathQA, a dataset derived from MATH and GSM8K, on which we train the PersonaMath models. Our approach consists of two stages: the first stage is learning from Persona Diversification, and the second stage is learning from Reflection. In the first stage, we regenerate detailed chain-of-thought (CoT) solutions as instructions using a closed-source LLM and introduce a novel persona-driven data augmentation technique to enhance the dataset's quantity and diversity. In the second stage, we incorporate reflection to fully leverage more challenging and valuable questions. Evaluation of our PersonaMath models on MATH and GSM8K reveals that the PersonaMath-7B model (based on LLaMA-2-7B) achieves an accuracy of 24.2% on MATH and 68.7% on GSM8K, surpassing all baseline methods and achieving state-of-the-art performance. Notably, our dataset contains only 70.3K data points-merely 17.8% of MetaMathQA and 27% of MathInstruct-yet our model outperforms these baselines, demonstrating the high quality and diversity of our dataset, which enables more efficient model training. We open-source the PersonaMathQA dataset, PersonaMath models, and our code for public usage.

Related papers

SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers [10.558690357768336]
We introduce textbfSAND-Math (Synthetic Augmented Novel and Difficult Mathematics problems and solutions)<n>We first generate high-quality problems from scratch and then systematically elevate their complexity via a new textbfDifficulty Hiking step.<n>We show our Difficulty Hiking process is highly effective: by increasing average problem difficulty from 5.02 to 5.98, this step lifts AIME25 performance from 46.38% to 49.23%.
arXiv Detail & Related papers (2025-07-28T05:17:48Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks. Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales. We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch [28.519536719973317]
ScaleQuest is a scalable and novel data synthesis method. It generates questions from scratch without the need for seed data with complex augmentation constraints. It can universally increase the performance of mainstream open-source models.
arXiv Detail & Related papers (2024-10-24T12:42:04Z)
Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance. Existing direct preference learning algorithms are originally designed for the single-turn chat task. We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z)
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On [55.449818944278526]
We introduce the Skywork-Math model series, supervised fine-tuned (SFT) on common 7B language models. Skywork-Math 7B has achieved impressive accuracies of 51.2% on the competition-level MATH benchmark. We provide several practical takeaways to enhance math reasoning abilities in LLMs for both research and industry applications.
arXiv Detail & Related papers (2024-07-11T09:56:51Z)
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models [62.815222721144636]
We introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K. This novel approach significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5. Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark.
arXiv Detail & Related papers (2024-06-25T05:43:21Z)
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning [11.426127461122908]
This work includes new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them. Open Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities. We propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code.
arXiv Detail & Related papers (2024-05-13T08:32:19Z)
Common 7B Language Models Already Possess Strong Math Capabilities [61.61442513067561]
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities. The potential for extensive scaling is constrained by the scarcity of publicly available math questions.
arXiv Detail & Related papers (2024-03-07T18:00:40Z)
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs [38.127313175508746]
MathGenie is a novel method for generating diverse and reliable math problems from a small-scale problem-solution dataset. Various pretrained models, ranging from 7B to 70B, are trained on the newly curated data to test the effectiveness of the proposed augmentation technique. MathGenieLM-InternLM2 achieves an accuracy of 87.7% on GSM8K and 55.7% on MATH, securing the best overall score among open-source language models.
arXiv Detail & Related papers (2024-02-26T07:17:25Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models [91.66694225955872]
We propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.
arXiv Detail & Related papers (2023-09-21T17:45:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.