A Survey of Large Language Models for Code: Evolution, Benchmarking, and
Future Trends
- URL: http://arxiv.org/abs/2311.10372v2
- Date: Mon, 8 Jan 2024 05:41:51 GMT
- Title: A Survey of Large Language Models for Code: Evolution, Benchmarking, and
Future Trends
- Authors: Zibin Zheng and Kaiwen Ning and Yanlin Wang and Jingwen Zhang and Dewu
Zheng and Mingxi Ye and Jiachi Chen
- Abstract summary: General large language models (LLMs) have demonstrated significant potential in tasks such as code generation in software engineering.
A considerable portion of Code LLMs is derived from general LLMs through model fine-tuning.
There is currently a lack of systematic investigation into Code LLMs and their performance.
- Score: 30.774685501251817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: General large language models (LLMs), represented by ChatGPT, have
demonstrated significant potential in tasks such as code generation in software
engineering. This has led to the development of specialized LLMs for software
engineering, known as Code LLMs. A considerable portion of Code LLMs is derived
from general LLMs through model fine-tuning. As a result, Code LLMs are often
updated frequently and their performance can be influenced by the base LLMs.
However, there is currently a lack of systematic investigation into Code LLMs
and their performance. In this study, we conduct a comprehensive survey and
analysis of the types of Code LLMs and their differences in performance
compared to general LLMs. We aim to address three questions: (1) What LLMs are
specifically designed for software engineering tasks, and what is the
relationship between these Code LLMs? (2) Do Code LLMs really outperform
general LLMs in software engineering tasks? (3) Which LLMs are more proficient
in different software engineering tasks? To answer these questions, we first
collect relevant literature and work from five major databases and open-source
communities, resulting in 134 works for analysis. Next, we categorize the Code
LLMs based on their publishers and examine their relationships with general
LLMs and among themselves. Furthermore, we investigate the performance
differences between general LLMs and Code LLMs in various software engineering
tasks to demonstrate the impact of base models and Code LLMs. Finally, we
comprehensively maintained the performance of LLMs across multiple mainstream
benchmarks to identify the best-performing LLMs for each software engineering
task. Our research not only assists developers of Code LLMs in choosing base
models for the development of more advanced LLMs but also provides insights for
practitioners to better understand key improvement directions for Code LLMs.
Related papers
- From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future [15.568939568441317]
We investigate the current practice and solutions for large language models (LLMs) and LLM-based agents for software engineering.
In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance.
We discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering.
arXiv Detail & Related papers (2024-08-05T14:01:15Z) - Large Language Models as Software Components: A Taxonomy for LLM-Integrated Applications [0.0]
Large Language Models (LLMs) have become widely adopted recently. Research explores their use both as autonomous agents and as tools for software engineering.
LLMs-integrated applications, on the other hand, are software systems that leverage an LLM to perform tasks that would otherwise be impossible or require significant coding effort.
This study provides a taxonomy for LLM-integrated applications, offering a framework for analyzing and describing these systems.
arXiv Detail & Related papers (2024-06-13T21:32:56Z) - Parrot: Efficient Serving of LLM-based Applications with Semantic Variable [11.894203842968745]
Parrot is a service system that focuses on the end-to-end experience of LLM-based applications.
A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests.
arXiv Detail & Related papers (2024-05-30T09:46:36Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage
and Sharing in LLMs [72.49064988035126]
We propose an approach called MKS2, aimed at enhancing multimodal large language models (MLLMs)
Specifically, we introduce the Modular Visual Memory, a component integrated into the internal blocks of LLMs, designed to store open-world visual information efficiently.
Our experiments demonstrate that MKS2 substantially augments the reasoning capabilities of LLMs in contexts necessitating physical or commonsense knowledge.
arXiv Detail & Related papers (2023-11-27T12:29:20Z) - Survey on Factuality in Large Language Models: Knowledge, Retrieval and
Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs)
As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z) - CodeApex: A Bilingual Programming Evaluation Benchmark for Large
Language Models [43.655927559990616]
We propose CodeApex, a benchmark dataset focusing on the programming comprehension, code generation, and code correction abilities of LLMs.
We evaluate 12 widely used LLMs, including both general-purpose and specialized models.
GPT-4 exhibits the best programming capabilities, achieving approximate accuracy of 69%, 54%, and 66% on the three tasks, respectively.
arXiv Detail & Related papers (2023-09-05T04:12:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.