Related papers: Qwen2.5-Coder Technical Report

Qwen2.5-Coder Technical Report

URL: http://arxiv.org/abs/2409.12186v3
Date: Tue, 12 Nov 2024 13:24:25 GMT
Title: Qwen2.5-Coder Technical Report
Authors: Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, Kai Dang, Yang Fan, Yichang Zhang, An Yang, Rui Men, Fei Huang, Bo Zheng, Yibo Miao, Shanghaoran Quan, Yunlong Feng, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, Junyang Lin,
Abstract summary: We introduce the Qwen2.5-Coder series, a significant upgrade from its predecessor, CodeQwen1.5. As a code-specific model, Qwen2.5-Coder is built upon the Qwen2.5 architecture and continues pretrained on a vast corpus of over 5.5 trillion tokens.
Score: 105.131580912726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this report, we introduce the Qwen2.5-Coder series, a significant upgrade from its predecessor, CodeQwen1.5. This series includes six models: Qwen2.5-Coder-(0.5B/1.5B/3B/7B/14B/32B). As a code-specific model, Qwen2.5-Coder is built upon the Qwen2.5 architecture and continues pretrained on a vast corpus of over 5.5 trillion tokens. Through meticulous data cleaning, scalable synthetic data generation, and balanced data mixing, Qwen2.5-Coder demonstrates impressive code generation capabilities while retaining general and math skills. These models have been evaluated on a wide range of code-related tasks, achieving state-of-the-art (SOTA) performance across more than 10 benchmarks, including code generation, completion, reasoning, and repair, consistently outperforming larger models of the same model size. We believe that the release of the Qwen2.5-Coder series will advance research in code intelligence and, with its permissive licensing, support wider adoption by developers in real-world applications.

Related papers

Robust Learning of Diverse Code Edits [10.565439872488328]
Software engineering activities frequently involve edits to existing code. Code language models (LMs) lack the ability to handle diverse types of code-edit requirements.
arXiv Detail & Related papers (2025-03-05T16:39:04Z)
ACECODER: Acing Coder RL via Automated Test-Case Synthesis [36.740393665032954]
We design a pipeline that generates extensive (question, test-cases) pairs from existing code data. We construct preference pairs based on pass rates over sampled programs to train reward models with Bradley-Terry loss. We show that our RL training can improve model on HumanEval-plus by over 25% and MBPP-plus by 6% for merely 80 optimization steps.
arXiv Detail & Related papers (2025-02-03T18:46:04Z)
Qwen2.5-1M Technical Report [72.09755998661568]
We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. By leveraging our inference framework, the Qwen2.5-1M models achieve a remarkable 3x to 7x prefill speedup.
arXiv Detail & Related papers (2025-01-26T03:47:25Z)
Qwen2.5 Technical Report [122.13958993185952]
We introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. Open-weight offerings include base and instruction-tuned models, with quantized versions available. For hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus.
arXiv Detail & Related papers (2024-12-19T17:56:09Z)
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement [71.46993852662021]
We present a series of math-specific large language models: Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. Qwen2.5-Math-Instruct supports both Chinese and English, and possess advanced mathematical reasoning capabilities.
arXiv Detail & Related papers (2024-09-18T16:45:37Z)
VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development. We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM) We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z)
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence [42.517055368627226]
We introduce the DeepSeek-Coder series, a range of open-source code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion tokens. Our evaluations demonstrate that DeepSeek-Coder achieves state-of-the-art performance among open-source code models across multiple benchmarks. DeepSeek-Coder models are under a permissive license that allows for both research and unrestricted commercial use.
arXiv Detail & Related papers (2024-01-25T14:17:53Z)
Empirical Study on Transformer-based Techniques for Software Engineering [12.973997150227198]
We review the existing literature, examine the suitability of model architectures for different tasks, and look at the generalization ability of models on different datasets. We conduct experiments on the top-4 most targeted software engineering tasks that we found in our literature survey: Code Summarization, Bug Fixing, Bug Detection, and Code Search.
arXiv Detail & Related papers (2023-09-30T14:45:22Z)
CCT5: A Code-Change-Oriented Pre-Trained Model [14.225942520238936]
We propose to pre-train a model specially designed for code changes to better support developers in software maintenance. We first collect a large-scale dataset containing 1.5M+ pairwise data of code changes and commit messages. We fine-tune the pre-trained model, CCT5, on three widely-labelled tasks incurred by code changes and two tasks specific to the code review process.
arXiv Detail & Related papers (2023-05-18T07:55:37Z)
CodeT5+: Open Code Large Language Models for Code Understanding and Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)
Improving Code Search with Hard Negative Sampling Based on Fine-tuning [15.341959871682981]
We introduce a cross-encoder architecture for code search that jointly encodes the concatenation of query and code. We also introduce a Retriever-Ranker (RR) framework that cascades the dual-encoder and cross-encoder to promote the efficiency of evaluation and online serving.
arXiv Detail & Related papers (2023-05-08T07:04:28Z)
Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study [4.438873396405334]
We aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion. For comments, we find that the models perform better in the presence of multi-line comments.
arXiv Detail & Related papers (2023-04-24T17:09:14Z)
CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code. We conduct a human study to identify the criteria for high-quality explanatory docstring for code. We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.