A Transformer-Based Approach for Smart Invocation of Automatic Code Completion
- URL: http://arxiv.org/abs/2405.14753v1
- Date: Thu, 23 May 2024 16:19:32 GMT
- Title: A Transformer-Based Approach for Smart Invocation of Automatic Code Completion
- Authors: Aral de Moor, Arie van Deursen, Maliheh Izadi,
- Abstract summary: We develop a machine learning model that can predict when to invoke a code completion tool.
We collect a dataset of 200k developer interactions with our cross-IDE code completion plugin.
Our results indicate that our small-scale transformer model significantly outperforms the baseline.
- Score: 14.34818742116731
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be intrusive, especially when they suggest too often and interrupt developers who are concentrating on their work. Current research largely overlooks how these models interact with developers in practice and neglects to address when a developer should receive completion suggestions. To tackle this issue, we developed a machine learning model that can accurately predict when to invoke a code completion tool given the code context and available telemetry data. To do so, we collect a dataset of 200k developer interactions with our cross-IDE code completion plugin and train several invocation filtering models. Our results indicate that our small-scale transformer model significantly outperforms the baseline while maintaining low enough latency. We further explore the search space for integrating additional telemetry data into a pre-trained transformer directly and obtain promising results. To further demonstrate our approach's practical potential, we deployed the model in an online environment with 34 developers and provided real-world insights based on 74k actual invocations.
Related papers
- Improving FIM Code Completions via Context & Curriculum Based Learning [6.779631208983878]
We develop a curriculum dataset by extracting hard-to-complete patterns from code repositories.
We generate context examples using semantic and static analysis tools.
We validate our approach through online A/B testing, demonstrating tangible improvements in Completion Acceptance Rate (CAR) and Completion Persistence (CPR)
arXiv Detail & Related papers (2024-12-21T11:30:54Z) - AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials [53.376263056033046]
We propose a scalable data synthesis pipeline that generates high-quality GUI agent trajectories by leveraging web tutorials.
Our method automatically gathers tutorial-like texts from the internet, transforms them into task goals with step-by-step instructions, and employs a visual-language model agent.
A VLM-based evaluator ensures the correctness of the generated trajectories.
arXiv Detail & Related papers (2024-12-12T18:59:27Z) - DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production [5.030384831047144]
We present DialogAgent, an automated tool for generating synthetic training data that closely mimics real developer interactions.
The tool significantly reduces the reliance on manual data generation, increasing efficiency by 4.8 times compared to traditional methods.
arXiv Detail & Related papers (2024-12-11T03:31:36Z) - Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a new sandbox suite tailored for integrated data-model co-development.
This sandbox provides a feedback-driven experimental platform, enabling cost-effective and guided refinement of both data and models.
arXiv Detail & Related papers (2024-07-16T14:40:07Z) - Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models.
We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks.
We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z) - A Machine Learning Approach Towards SKILL Code Autocompletion [6.586356094533907]
This study is the first to apply transformers to SKILL code autocompletion towards improving the productivity of hardware design engineers.
We propose a novel methodology for creating a high-quality SKILL dataset with both unlabeled and labeled data.
We show that models trained using the proposed methodology outperform baselines in terms of human-judgment score and BLEU score.
arXiv Detail & Related papers (2023-12-04T14:29:28Z) - Enriching Source Code with Contextual Data for Code Completion Models:
An Empirical Study [4.438873396405334]
We aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion.
For comments, we find that the models perform better in the presence of multi-line comments.
arXiv Detail & Related papers (2023-04-24T17:09:14Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - Injecting Knowledge in Data-driven Vehicle Trajectory Predictors [82.91398970736391]
Vehicle trajectory prediction tasks have been commonly tackled from two perspectives: knowledge-driven or data-driven.
In this paper, we propose to learn a "Realistic Residual Block" (RRB) which effectively connects these two perspectives.
Our proposed method outputs realistic predictions by confining the residual range and taking into account its uncertainty.
arXiv Detail & Related papers (2021-03-08T16:03:09Z) - Sequence Model Design for Code Completion in the Modern IDE [3.4824234779710452]
We propose a novel design for predicting top-k next tokens that combines static analysis' ability to enumerate all valid keywords and in-scope identifiers with the ability of a language model to place a probability distribution over them.
Our model mixes character-level input representation with token output to represent out-of-vocabulary (OOV) tokens meaningfully and minimize prediction latency.
arXiv Detail & Related papers (2020-04-10T22:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.