Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models
- URL: http://arxiv.org/abs/2308.01825v2
- Date: Wed, 13 Sep 2023 03:57:29 GMT
- Title: Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models
- Authors: Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Keming Lu,
Chuanqi Tan, Chang Zhou, Jingren Zhou
- Abstract summary: We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM.
We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
- Score: 75.29595679428105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mathematical reasoning is a challenging task for large language models
(LLMs), while the scaling relationship of it with respect to LLM capacity is
under-explored. In this paper, we investigate how the pre-training loss,
supervised data amount, and augmented data amount influence the reasoning
performances of a supervised LLM. We find that pre-training loss is a better
indicator of the model's performance than the model's parameter count. We apply
supervised fine-tuning (SFT) with different amounts of supervised data and
empirically find a log-linear relation between data amount and model
performance, and we find better models improve less with enlarged supervised
datasets. To augment more data samples for improving model performances without
any human effort, we propose to apply Rejection sampling Fine-Tuning (RFT). RFT
uses supervised models to generate and collect correct reasoning paths as
augmented fine-tuning datasets. We find with augmented samples containing more
distinct reasoning paths, RFT improves mathematical reasoning performance more
for LLMs. We also find RFT brings more improvement for less performant LLMs.
Furthermore, we combine rejection samples from multiple models which push
LLaMA-7B to an accuracy of 49.3\% on GSM8K which outperforms the supervised
fine-tuning (SFT) accuracy of 35.9\% significantly.
Related papers
- S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference.
Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z) - From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning [31.95005389919542]
Scaling data and model size has been proven effective for boosting the performance of large language models.
In this work, we introduce Aggregation Fine-Tuning (AFT), a supervised finetuning paradigm.
Empirical evaluations on benchmark datasets show that AFT-trained models substantially outperform standard SFT.
arXiv Detail & Related papers (2025-01-21T04:11:59Z) - Learning to Verify Summary Facts with Fine-Grained LLM Feedback [15.007479147796403]
Training automatic summary fact verifiers often faces the challenge of a lack of human-labeled data.
We introduce FineSumFact, a large-scale dataset containing fine-grained factual feedback on summaries.
arXiv Detail & Related papers (2024-12-14T05:28:44Z) - Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.
LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.
Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z) - Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models [0.8399688944263842]
Large Language Models (LLMs) have the capability to understand and generate human-like text from input queries.
This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines.
We evaluate the impact of fine-tuning on the LLMs' capacity for data extraction and contextual understanding.
arXiv Detail & Related papers (2024-06-17T04:35:17Z) - LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement [79.31084387589968]
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks.
We propose LLM2LLM, a data augmentation strategy that uses a teacher LLM to enhance a small seed dataset.
We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime.
arXiv Detail & Related papers (2024-03-22T08:57:07Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models.
We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.