CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
- URL: http://arxiv.org/abs/2310.06266v2
- Date: Wed, 10 Jan 2024 19:59:03 GMT
- Title: CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
- Authors: Peng Di, Jianguo Li, Hang Yu, Wei Jiang, Wenting Cai, Yang Cao, Chaoyu
Chen, Dajun Chen, Hongwei Chen, Liang Chen, Gang Fan, Jie Gong, Zi Gong, Wen
Hu, Tingting Guo, Zhichao Lei, Ting Li, Zheng Li, Ming Liang, Cong Liao,
Bingchang Liu, Jiachen Liu, Zhiwei Liu, Shaojun Lu, Min Shen, Guangpei Wang,
Huan Wang, Zhi Wang, Zhaogui Xu, Jiawei Yang, Qing Ye, Gehao Zhang, Yu Zhang,
Zelin Zhao, Xunjin Zheng, Hailian Zhou, Lifu Zhu, Xianying Zhu
- Abstract summary: This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM.
It is specifically designed for code-related tasks with both English and Chinese prompts.
CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset.
- Score: 58.127534002232096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code Large Language Models (Code LLMs) have gained significant attention in
the industry due to their wide applications in the full lifecycle of software
engineering. However, the effectiveness of existing models in understanding
non-English inputs for multi-lingual code-related tasks is still far from well
studied. This paper introduces CodeFuse-13B, an open-sourced pre-trained code
LLM. It is specifically designed for code-related tasks with both English and
Chinese prompts and supports over 40 programming languages. CodeFuse achieves
its effectiveness by utilizing a high quality pre-training dataset that is
carefully filtered by program analyzers and optimized during the training
process. Extensive experiments are conducted using real-world usage scenarios,
the industry-standard benchmark HumanEval-x, and the specially designed
CodeFuseEval for Chinese prompts. To assess the effectiveness of CodeFuse, we
actively collected valuable human feedback from the AntGroup's software
development process where CodeFuse has been successfully deployed. The results
demonstrate that CodeFuse-13B achieves a HumanEval pass@1 score of 37.10%,
positioning it as one of the top multi-lingual code LLMs with similar parameter
sizes. In practical scenarios, such as code generation, code translation, code
comments, and testcase generation, CodeFuse performs better than other models
when confronted with Chinese prompts.
Related papers
- SpecTra: Enhancing the Code Translation Ability of Language Models by Generating Multi-Modal Specifications [17.60108067953814]
Large language models (LLMs) are increasingly being used for the task of automated code translation.
We propose SpecTra, a multi-stage approach that uses a novel self-consistency filter to first generate high-quality specifications.
arXiv Detail & Related papers (2024-05-28T20:48:30Z) - Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation [8.81447711370817]
We empirically analyze the generated outputs of eleven popular instruct-tuned large language models (LLMs)
Our results demonstrate that a strategic combination of prompt engineering and regular expression can effectively extract the source code from the model generation output.
arXiv Detail & Related papers (2024-03-25T21:41:31Z) - IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs.
We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files.
Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language.
Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z) - Code Needs Comments: Enhancing Code LLMs with Comment Augmentation [91.52444946362547]
We introduce a novel data augmentation method that generates comments for existing code, coupled with a data filtering strategy that filters out code data poorly correlated with natural language.
We conducted experiments on three code-focused Large Language Models and observed consistent improvements in performance on two widely-used programming skill benchmarks.
arXiv Detail & Related papers (2024-02-20T13:56:38Z) - Exploring Large Language Models for Code Explanation [3.2570216147409514]
Large Language Models (LLMs) have made remarkable strides in Natural Language Processing.
This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs.
arXiv Detail & Related papers (2023-10-25T14:38:40Z) - CodeT5+: Open Code Large Language Models for Code Understanding and
Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z) - xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code
Understanding, Generation, Translation and Retrieval [32.60391966381949]
We introduce xCodeEval, the largest executable multilingual multitask benchmark to date.
It features a total of $7$ tasks involving code understanding, generation, translation and retrieval.
xCodeEval adopts an execution-based evaluation and offers a multilingual code execution engine, ExecEval.
arXiv Detail & Related papers (2023-03-06T10:08:51Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for
Programming Languages [37.60016772021422]
Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa.
Recent studies have demonstrated the effectiveness of generative pre-training in computer programs, yet they are always English-centric.
We release ERNIE-Code, a unified pre-trained language model for 116 NLs and 6 PLs.
arXiv Detail & Related papers (2022-12-13T17:21:44Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.