Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning
- URL: http://arxiv.org/abs/2405.20535v1
- Date: Thu, 30 May 2024 23:20:25 GMT
- Title: Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning
- Authors: Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye, Xianjun Yang, Lichang Chen, William Yang Wang, Linda Ruth Petzold,
- Abstract summary: Instruction Fine-Tuning (IFT) significantly enhances the zero-shot capabilities of pretrained Large Language Models (LLMs)
This paper investigates how coding data impact LLMs' reasoning capacities during the IFT stage.
- Score: 64.5243480989869
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Instruction Fine-Tuning (IFT) significantly enhances the zero-shot capabilities of pretrained Large Language Models (LLMs). While coding data is known to boost reasoning abilities during LLM pretraining, its role in activating internal reasoning capacities during IFT remains understudied. This paper investigates a key question: How does coding data impact LLMs' reasoning capacities during the IFT stage? To explore this, we thoroughly examine the impact of coding data across different coding data proportions, model families, sizes, and reasoning domains, from various perspectives. Specifically, we create three IFT datasets with increasing coding data proportions, fine-tune six LLM backbones across different families and scales on these datasets, evaluate the tuned models' performance across twelve tasks in three reasoning domains, and analyze the outcomes from three broad-to-granular perspectives: overall, domain-level, and task-specific. Our holistic analysis provides valuable insights in each perspective. First, coding data tuning enhances the overall reasoning capabilities of LLMs across different model families and scales. Moreover, the effect of coding data varies among different domains but shows consistent trends across model families and scales within each domain. Additionally, coding data generally yields comparable task-specific benefits across different model families, with the optimal coding data proportions in IFT datasets being task-specific.
Related papers
- Diversity as a Reward: Fine-Tuning LLMs on a Mixture of Domain-Undetermined Data [36.277423093218275]
We study the role of data diversity in enhancing the overall abilities of large language models (LLMs)
We propose a new method that gives the LLM a dual identity: an output model to cognitively probe and select data based on diversity reward, as well as an input model to be tuned with the selected data.
arXiv Detail & Related papers (2025-02-05T17:21:01Z) - Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications.
The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard.
We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z) - Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions [59.5243730853157]
Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets.
This article conducts a comparative analysis of three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues.
arXiv Detail & Related papers (2025-01-08T11:37:06Z) - 60 Data Points are Sufficient to Fine-Tune LLMs for Question-Answering [50.12622877002846]
Large language models (LLMs) encode extensive world knowledge through pre-training on massive datasets, which can be fine-tuned for the question-answering (QA) task.
We categorize supervised fine-tuning (SFT) data based on the extent of knowledge memorized by the pretrained LLMs.
Our experiments show that as few as 60 data points during the SFT stage can activate the knowledge encoded during pre-training, enabling LLMs to perform the QA task.
arXiv Detail & Related papers (2024-09-24T07:38:38Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning [45.96954837114004]
We systematically analyze the impact of 48 datasets from 5 major categories of pretraining data of Large Language Models.
Our analyses provide empirical results about the contribution of multiple corpora on the performances of LLMs.
arXiv Detail & Related papers (2024-02-18T10:36:05Z) - Dynamics of Instruction Tuning: Each Ability of Large Language Models
Has Its Own Growth Pace [21.015261553612643]
We present a dataset with over 40k instances across ten abilities and examine instruction-tuned models with 7b to 33b parameters.
Our study reveals three primary findings: (i) Despite the models' overall performance being tied to data and parameter scale, individual abilities have different sensitivities to these factors.
Human-curated data strongly outperforms synthetic data from GPT-4 in efficiency and can constantly enhance model performance with volume increases.
arXiv Detail & Related papers (2023-10-30T15:37:10Z) - How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition [64.86360698067764]
This study focuses on the interplay of data composition between mathematical reasoning, code generation, and general human-aligning abilities during supervised fine-tuning.
Our experiments reveal that distinct capabilities scale differently and larger models generally show superior performance with same amount of data.
Our findings suggest the amount of composition data influences performance more than the composition ratio.
arXiv Detail & Related papers (2023-10-09T07:56:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.