At Which Training Stage Does Code Data Help LLMs Reasoning?
- URL: http://arxiv.org/abs/2309.16298v2
- Date: Sat, 30 Sep 2023 07:01:13 GMT
- Title: At Which Training Stage Does Code Data Help LLMs Reasoning?
- Authors: Yingwei Ma and Yue Liu and Yue Yu and Yuanliang Zhang and Yu Jiang and
Changjian Wang and Shanshan Li
- Abstract summary: This paper explores the impact of code data on Large Language Models (LLMs) at different stages.
Pre-training LLMs with the mixture of code and text can significantly enhance LLMs' general reasoning capability.
At the instruction-tuning stage, code data endows LLMs the task-specific reasoning capability.
- Score: 21.74241875923737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have exhibited remarkable reasoning capabilities
and become the foundation of language technologies. Inspired by the great
success of code data in training LLMs, we naturally wonder at which training
stage introducing code data can really help LLMs reasoning. To this end, this
paper systematically explores the impact of code data on LLMs at different
stages. Concretely, we introduce the code data at the pre-training stage,
instruction-tuning stage, and both of them, respectively. Then, the reasoning
capability of LLMs is comprehensively and fairly evaluated via six reasoning
tasks in five domains. We critically analyze the experimental results and
provide conclusions with insights. First, pre-training LLMs with the mixture of
code and text can significantly enhance LLMs' general reasoning capability
almost without negative transfer on other tasks. Besides, at the
instruction-tuning stage, code data endows LLMs the task-specific reasoning
capability. Moreover, the dynamic mixing strategy of code and text data assists
LLMs to learn reasoning capability step-by-step during training. These insights
deepen the understanding of LLMs regarding reasoning ability for their
application, such as scientific question answering, legal support, etc. The
source code and model parameters are released at the
link:~\url{https://github.com/yingweima2022/CodeLLM}.
Related papers
- Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities.
The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z) - zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
Large language models (LLMs) have the capability of zero-shot learning, which does not require training or fine-tuning.
We propose zsLLMCode, a novel approach that generates functional code embeddings using LLMs.
arXiv Detail & Related papers (2024-09-23T01:03:15Z) - Case2Code: Learning Inductive Reasoning with Synthetic Data [105.89741089673575]
We propose a textbfCase2Code task by exploiting the expressiveness and correctness of programs.
We first evaluate representative LLMs on the synthesized Case2Code task and demonstrate that the Case-to-code induction is challenging for LLMs.
Experimental results show that such induction training benefits not only in distribution Case2Code performance but also enhances various coding abilities of trained LLMs.
arXiv Detail & Related papers (2024-07-17T11:35:00Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - A Survey of Large Language Models for Code: Evolution, Benchmarking, and
Future Trends [30.774685501251817]
General large language models (LLMs) have demonstrated significant potential in tasks such as code generation in software engineering.
A considerable portion of Code LLMs is derived from general LLMs through model fine-tuning.
There is currently a lack of systematic investigation into Code LLMs and their performance.
arXiv Detail & Related papers (2023-11-17T07:55:16Z) - Survey on Factuality in Large Language Models: Knowledge, Retrieval and
Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs)
As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - CodeApex: A Bilingual Programming Evaluation Benchmark for Large
Language Models [43.655927559990616]
We propose CodeApex, a benchmark dataset focusing on the programming comprehension, code generation, and code correction abilities of LLMs.
We evaluate 12 widely used LLMs, including both general-purpose and specialized models.
GPT-4 exhibits the best programming capabilities, achieving approximate accuracy of 69%, 54%, and 66% on the three tasks, respectively.
arXiv Detail & Related papers (2023-09-05T04:12:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.