Towards Full-line Code Completion with Neural Language Models
- URL: http://arxiv.org/abs/2009.08603v1
- Date: Fri, 18 Sep 2020 03:12:13 GMT
- Title: Towards Full-line Code Completion with Neural Language Models
- Authors: Wenhan Wang, Sijie Shen, Ge Li, Zhi Jin
- Abstract summary: We discuss the probability of directly completing a whole line of code instead of a single token.
Recent neural language models have been adopted as a preferred approach for code completion.
- Score: 25.458883198815393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A code completion system suggests future code elements to developers given a
partially-complete code snippet. Code completion is one of the most useful
features in Integrated Development Environments (IDEs). Currently, most code
completion techniques predict a single token at a time. In this paper, we take
a further step and discuss the probability of directly completing a whole line
of code instead of a single token. We believe suggesting longer code sequences
can further improve the efficiency of developers. Recently neural language
models have been adopted as a preferred approach for code completion, and we
believe these models can still be applied to full-line code completion with a
few improvements. We conduct our experiments on two real-world python corpora
and evaluate existing neural models based on source code tokens or syntactical
actions. The results show that neural language models can achieve acceptable
results on our tasks, with significant room for improvements.
Related papers
- Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models.
We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks.
We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z) - LongCoder: A Long-Range Pre-trained Language Model for Code Completion [56.813974784131624]
LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens.
Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction.
memory tokens are included to highlight important statements that may be invoked later and need to be memorized.
arXiv Detail & Related papers (2023-06-26T17:59:24Z) - Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures.
We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution.
We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z) - Enriching Source Code with Contextual Data for Code Completion Models:
An Empirical Study [4.438873396405334]
We aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion.
For comments, we find that the models perform better in the presence of multi-line comments.
arXiv Detail & Related papers (2023-04-24T17:09:14Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations.
For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z) - Toward Less Hidden Cost of Code Completion with Acceptance and Ranking
Models [12.736207952790618]
We develop an ensemble framework that can combine results from multiple models to draw merits and offset defects of each model.
This paper conducts a coding simulation to collect data from code context and different code completion models.
We propose a new code completion evaluation metric, Benefit-Cost Ratio(BCR), taking into account the benefit of keystrokes saving and hidden cost of completion list browsing.
arXiv Detail & Related papers (2021-06-26T03:02:49Z) - Sequence Model Design for Code Completion in the Modern IDE [3.4824234779710452]
We propose a novel design for predicting top-k next tokens that combines static analysis' ability to enumerate all valid keywords and in-scope identifiers with the ability of a language model to place a probability distribution over them.
Our model mixes character-level input representation with token output to represent out-of-vocabulary (OOV) tokens meaningfully and minimize prediction latency.
arXiv Detail & Related papers (2020-04-10T22:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.