Full Line Code Completion: Bringing AI to Desktop
- URL: http://arxiv.org/abs/2405.08704v2
- Date: Mon, 07 Oct 2024 17:23:25 GMT
- Title: Full Line Code Completion: Bringing AI to Desktop
- Authors: Anton Semenkin, Vitaliy Bibaev, Yaroslav Sokolov, Kirill Krylov, Alexey Kalina, Anna Khannanova, Danila Savenkov, Darya Rovdo, Igor Davidenko, Kirill Karnaukhov, Maxim Vakhrushev, Mikhail Kostyukov, Mikhail Podvitskii, Petr Surkov, Yaroslav Golubev, Nikita Povarov, Timofey Bryksin,
- Abstract summary: We describe our approach for building a multi-token code completion feature for the JetBrains' IntelliJ Platform.
The feature suggests only syntactically correct code and works fully locally, i.e., data querying and the generation of suggestions happens on the end user's machine.
- Score: 3.5296482958373447
- License:
- Abstract: In recent years, several industrial solutions for the problem of multi-token code completion appeared, each making a great advance in the area but mostly focusing on cloud-based runtime and avoiding working on the end user's device. In this work, we describe our approach for building a multi-token code completion feature for the JetBrains' IntelliJ Platform, which we call Full Line Code Completion. The feature suggests only syntactically correct code and works fully locally, i.e., data querying and the generation of suggestions happens on the end user's machine. We share important time and memory-consumption restrictions, as well as design principles that a code completion engine should satisfy. Working entirely on the end user's device, our code completion engine enriches user experience while being not only fast and compact but also secure. We share a number of useful techniques to meet the stated development constraints and also describe offline and online evaluation pipelines that allowed us to make better decisions. Our online evaluation shows that the usage of the tool leads to 1.3 times more Python code in the IDE being produced by code completion. The described solution was initially started with a help of researchers and was then bundled into all JetBrains IDEs where it is now used by millions of users. Thus, we believe that this work is useful for bridging academia and industry, providing researchers with the knowledge of what happens when complex research-based solutions are integrated into real products.
Related papers
- Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Long Code Arena: a Set of Benchmarks for Long-Context Code Models [75.70507534322336]
Long Code Arena is a suite of six benchmarks for code processing tasks that require project-wide context.
These tasks cover different aspects of code processing: library-based code generation, CI builds repair, project-level code completion, commit message generation, bug localization, and module summarization.
For each task, we provide a manually verified dataset for testing, an evaluation suite, and open-source baseline solutions.
arXiv Detail & Related papers (2024-06-17T14:58:29Z) - JetTrain: IDE-Native Machine Learning Experiments [4.23507375452691]
JetTrain is an integrated development environments (IDEs) tool for launching machine learning (ML) experiments.
A user can write and debug code locally and then seamlessly run it remotely using on-demand hardware.
We argue that this approach can lower the entry barrier for ML training problems and increase experiment throughput.
arXiv Detail & Related papers (2024-02-16T17:53:08Z) - Context Composing for Full Line Code Completion [0.46040036610482665]
The paper describes our approach to context composing for the Transformer model that is a core of the feature's implementation.
We share our next steps to improve the feature and emphasize the importance of several research aspects in the area.
arXiv Detail & Related papers (2024-02-14T15:17:37Z) - InterCode: Standardizing and Benchmarking Interactive Coding with
Execution Feedback [50.725076393314964]
We introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning environment.
Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution.
We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies.
arXiv Detail & Related papers (2023-06-26T17:59:50Z) - LongCoder: A Long-Range Pre-trained Language Model for Code Completion [56.813974784131624]
LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens.
Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction.
memory tokens are included to highlight important statements that may be invoked later and need to be memorized.
arXiv Detail & Related papers (2023-06-26T17:59:24Z) - All You Need Is Logs: Improving Code Completion by Learning from
Anonymous IDE Usage Logs [55.606644084003094]
We propose an approach for collecting completion usage logs from the users in an IDE.
We use them to train a machine learning based model for ranking completion candidates.
Our evaluation shows that using a simple ranking model trained on the past user behavior logs significantly improved code completion experience.
arXiv Detail & Related papers (2022-05-21T23:21:26Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Towards Full-line Code Completion with Neural Language Models [25.458883198815393]
We discuss the probability of directly completing a whole line of code instead of a single token.
Recent neural language models have been adopted as a preferred approach for code completion.
arXiv Detail & Related papers (2020-09-18T03:12:13Z) - IntelliCode Compose: Code Generation Using Transformer [7.623136583706195]
We introduce IntelliCode Compose $-$ a general-purpose multilingual code completion tool.
It is capable of predicting sequences of code tokens of arbitrary types, generating up to entire lines of syntactically correct code.
IntelliCode Compose is deployed as a cloud-based web service.
arXiv Detail & Related papers (2020-05-16T15:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.