Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models
- URL: http://arxiv.org/abs/2308.01825v2
- Date: Wed, 13 Sep 2023 03:57:29 GMT
- Title: Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models
- Authors: Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Keming Lu,
Chuanqi Tan, Chang Zhou, Jingren Zhou
- Abstract summary: We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM.
We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
- Score: 75.29595679428105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mathematical reasoning is a challenging task for large language models
(LLMs), while the scaling relationship of it with respect to LLM capacity is
under-explored. In this paper, we investigate how the pre-training loss,
supervised data amount, and augmented data amount influence the reasoning
performances of a supervised LLM. We find that pre-training loss is a better
indicator of the model's performance than the model's parameter count. We apply
supervised fine-tuning (SFT) with different amounts of supervised data and
empirically find a log-linear relation between data amount and model
performance, and we find better models improve less with enlarged supervised
datasets. To augment more data samples for improving model performances without
any human effort, we propose to apply Rejection sampling Fine-Tuning (RFT). RFT
uses supervised models to generate and collect correct reasoning paths as
augmented fine-tuning datasets. We find with augmented samples containing more
distinct reasoning paths, RFT improves mathematical reasoning performance more
for LLMs. We also find RFT brings more improvement for less performant LLMs.
Furthermore, we combine rejection samples from multiple models which push
LLaMA-7B to an accuracy of 49.3\% on GSM8K which outperforms the supervised
fine-tuning (SFT) accuracy of 35.9\% significantly.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more-efficient metric for performance estimation.
We extend the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources.
We employ a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - TRAWL: Tensor Reduced and Approximated Weights for Large Language Models [11.064868044313855]
We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a technique that applies tensor decomposition across multiple weight matrices to effectively denoise LLMs by capturing global structural patterns.
Our experiments show that TRAWL improves model performance by up to 16% over baseline models on benchmark datasets, without requiring additional data, training, or fine-tuning.
arXiv Detail & Related papers (2024-06-25T04:01:32Z) - Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models [0.8399688944263842]
Large Language Models (LLMs) have the capability to understand and generate human-like text from input queries.
This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines.
We evaluate the impact of fine-tuning on the LLMs' capacity for data extraction and contextual understanding.
arXiv Detail & Related papers (2024-06-17T04:35:17Z) - LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement [79.31084387589968]
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks.
We propose LLM2LLM, a data augmentation strategy that uses a teacher LLM to enhance a small seed dataset.
We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime.
arXiv Detail & Related papers (2024-03-22T08:57:07Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models.
We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z) - Complementary Ensemble Learning [1.90365714903665]
We derive a technique to improve performance of state-of-the-art deep learning models.
Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty.
arXiv Detail & Related papers (2021-11-09T03:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.