IntelliCode Compose: Code Generation Using Transformer
- URL: http://arxiv.org/abs/2005.08025v2
- Date: Thu, 29 Oct 2020 18:40:12 GMT
- Title: IntelliCode Compose: Code Generation Using Transformer
- Authors: Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, Neel Sundaresan
- Abstract summary: We introduce IntelliCode Compose $-$ a general-purpose multilingual code completion tool.
It is capable of predicting sequences of code tokens of arbitrary types, generating up to entire lines of syntactically correct code.
IntelliCode Compose is deployed as a cloud-based web service.
- Score: 7.623136583706195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In software development through integrated development environments (IDEs),
code completion is one of the most widely used features. Nevertheless, majority
of integrated development environments only support completion of methods and
APIs, or arguments.
In this paper, we introduce IntelliCode Compose $-$ a general-purpose
multilingual code completion tool which is capable of predicting sequences of
code tokens of arbitrary types, generating up to entire lines of syntactically
correct code. It leverages state-of-the-art generative transformer model
trained on 1.2 billion lines of source code in Python, $C\#$, JavaScript and
TypeScript programming languages. IntelliCode Compose is deployed as a
cloud-based web service. It makes use of client-side tree-based caching,
efficient parallel implementation of the beam search decoder, and compute graph
optimizations to meet edit-time completion suggestion requirements in the
Visual Studio Code IDE and Azure Notebook.
Our best model yields an average edit similarity of $86.7\%$ and a perplexity
of 1.82 for Python programming language.
Related papers
- Full Line Code Completion: Bringing AI to Desktop [3.5296482958373447]
We describe our approach for building a multi-token code completion feature for the JetBrains' IntelliJ Platform.
The feature suggests only syntactically correct code and works fully locally, i.e., data querying and the generation of suggestions happens on the end user's machine.
arXiv Detail & Related papers (2024-05-14T15:42:55Z) - Context Composing for Full Line Code Completion [0.46040036610482665]
The paper describes our approach to context composing for the Transformer model that is a core of the feature's implementation.
We share our next steps to improve the feature and emphasize the importance of several research aspects in the area.
arXiv Detail & Related papers (2024-02-14T15:17:37Z) - InterCode: Standardizing and Benchmarking Interactive Coding with
Execution Feedback [50.725076393314964]
We introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning environment.
Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution.
We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies.
arXiv Detail & Related papers (2023-06-26T17:59:50Z) - Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same.
Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks.
In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z) - CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X [50.008474888951525]
We introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation.
CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages.
arXiv Detail & Related papers (2023-03-30T17:34:01Z) - CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code [75.08995072899594]
We propose CodeBERTScore: an evaluation metric for code generation.
CodeBERTScore encodes the natural language input preceding the generated code.
We find that CodeBERTScore achieves a higher correlation with human preference and with functional correctness than all existing metrics.
arXiv Detail & Related papers (2023-02-10T22:12:05Z) - Tackling Long Code Search with Splitting, Encoding, and Aggregating [67.02322603435628]
We propose a new baseline SEA (Split, Encode and Aggregate) for long code search.
It splits long code into code blocks, encodes these blocks into embeddings, and aggregates them to obtain a comprehensive long code representation.
With GraphCodeBERT as the encoder, SEA achieves an overall mean reciprocal ranking score of 0.785, which is 10.1% higher than GraphCodeBERT on the CodeSearchNet benchmark.
arXiv Detail & Related papers (2022-08-24T02:27:30Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Towards Full-line Code Completion with Neural Language Models [25.458883198815393]
We discuss the probability of directly completing a whole line of code instead of a single token.
Recent neural language models have been adopted as a preferred approach for code completion.
arXiv Detail & Related papers (2020-09-18T03:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.