Related papers: DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

URL: http://arxiv.org/abs/2406.11931v1
Date: Mon, 17 Jun 2024 13:51:35 GMT
Title: DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen, Xin Xie, Kang Guan, Yuxiang You, Aixin Liu, Qiushi Du, Wenjun Gao, Xuan Lu, Qinyu Chen, Yaohui Wang, Chengqi Deng, Jiashi Li, Chenggang Zhao, Chong Ruan, Fuli Luo, Wenfeng Liang,
Abstract summary: DeepSeek-Coder-V2 is an open-source code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro.
Score: 43.589403386634615
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.

Related papers

DeepSeek-V3 Technical Report [147.16121855209246]
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models.
arXiv Detail & Related papers (2024-12-27T04:03:16Z)
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding [39.14141055325595]
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models. For the vision component, we incorporate a dynamic tiling vision encoding strategy designed for processing high-resolution images with different aspect ratios. For the language component, we leverage DeepSeekMoE models with the Multi-head Latent Attention mechanism.
arXiv Detail & Related papers (2024-12-13T17:37:48Z)
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search [16.477438279316576]
We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4. The model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1. We propose RMaxTS, a variant of Monte-Carlo tree search that employs an intrinsic-reward-driven exploration strategy to generate diverse proof paths.
arXiv Detail & Related papers (2024-08-15T13:40:03Z)
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model [118.06260386652778]
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs.
arXiv Detail & Related papers (2024-05-07T15:56:43Z)
StarCoder 2 and The Stack v2: The Next Generation [105.93298676368798]
We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens. We thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size.
arXiv Detail & Related papers (2024-02-29T13:53:35Z)
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement [58.034012276819425]
We introduce OpenCodeInterpreter, a family of open-source code systems for generating, executing, and iteratively refining code. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance.
arXiv Detail & Related papers (2024-02-22T16:06:23Z)
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence [42.517055368627226]
We introduce the DeepSeek-Coder series, a range of open-source code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion tokens. Our evaluations demonstrate that DeepSeek-Coder achieves state-of-the-art performance among open-source code models across multiple benchmarks. DeepSeek-Coder models are under a permissive license that allows for both research and unrestricted commercial use.
arXiv Detail & Related papers (2024-01-25T14:17:53Z)
Adding Context to Source Code Representations for Deep Learning [13.676416860721877]
We argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed. We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model.
arXiv Detail & Related papers (2022-07-30T12:47:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.