AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed
methods evaluation
- URL: http://arxiv.org/abs/2305.12050v2
- Date: Fri, 16 Feb 2024 19:52:45 GMT
- Title: AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed
methods evaluation
- Authors: Vijayaraghavan Murali, Chandra Maddila, Imad Ahmad, Michael Bolin,
Daniel Cheng, Negar Ghorbani, Renuka Fernandez, Nachiappan Nagappan, Peter C.
Rigby
- Abstract summary: We present CodeCompose, an AI-assisted code authoring tool developed and deployed at Meta internally.
CodeCompose is based on the InCoder LLM that merges generative capabilities with bi-directionality.
In a random sample of 20K source code files, we are able to reproduce hidden lines between 40% and 58% of the time, an improvement of 1.4x and 4.1x over a model trained only on public data.
- Score: 9.915327592560896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative LLMs have been shown to effectively power AI-based code authoring
tools that can suggest entire statements or blocks of code during code
authoring. In this paper we present CodeCompose, an AI-assisted code authoring
tool developed and deployed at Meta internally. CodeCompose is based on the
InCoder LLM that merges generative capabilities with bi-directionality. We have
scaled up CodeCompose to serve tens of thousands of developers at Meta, across
9 programming languages and several coding surfaces. We present our experience
in making design decisions about the model and system architecture for
CodeCompose that addresses these challenges.
To release a LLM model at this scale, we needed to first ensure that it is
sufficiently accurate. In a random sample of 20K source code files, depending
on the language, we are able to reproduce hidden lines between 40% and 58% of
the time, an improvement of 1.4x and 4.1x over a model trained only on public
data.
We gradually rolled CodeCompose out to developers. At the time of this
writing, 16K developers have used it with 8% of their code coming directly from
CodeCompose.
To triangulate our numerical findings, we conduct a thematic analysis on the
feedback from 70 developers. We find that 91.5% of the feedback is positive,
with the most common themes being discovering APIs, dealing with boilerplate
code, and accelerating coding. Meta continues to integrate this feedback into
CodeCompose.
Related papers
- Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - Assessing Consensus of Developers' Views on Code Readability [3.798885293742468]
Developers now spend more time reviewing code than writing it, highlighting the importance of Code Readability for code comprehension.
Previous research found that existing Code Readability models were inaccurate in representing developers' notions.
We surveyed 10 Java developers with similar coding experience to evaluate their consensus on Code Readability assessments and related aspects.
arXiv Detail & Related papers (2024-07-04T09:54:42Z) - DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories [83.5195424237358]
Existing benchmarks are poorly aligned with real-world code repositories.
We propose a new benchmark named DevEval, which has three advances.
DevEval comprises 1,874 testing samples from 117 repositories, covering 10 popular domains.
arXiv Detail & Related papers (2024-05-30T09:03:42Z) - CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants [22.342331134131744]
CodeCloak is a novel deep reinforcement learning agent that manipulates the prompts before sending them to the code assistant service.
CodeCloak aims to achieve the following two contradictory goals: (i) minimizing code leakage, while (ii) preserving relevant and useful suggestions for the developer.
arXiv Detail & Related papers (2024-04-13T19:30:58Z) - Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback [29.136378191436396]
We present CoCoGen, a new code generation approach that uses compiler feedback to improve the LLM-generated code.
CoCoGen first leverages static analysis to identify mismatches between the generated code and the project's context.
It then iteratively aligns and fixes the identified errors using information extracted from the code repository.
arXiv Detail & Related papers (2024-03-25T14:07:27Z) - OpenCodeInterpreter: Integrating Code Generation with Execution and
Refinement [58.034012276819425]
We introduce OpenCodeInterpreter, a family of open-source code systems for generating, executing, and iteratively refining code.
Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance.
arXiv Detail & Related papers (2024-02-22T16:06:23Z) - CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X [50.008474888951525]
We introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation.
CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages.
arXiv Detail & Related papers (2023-03-30T17:34:01Z) - Tackling Long Code Search with Splitting, Encoding, and Aggregating [67.02322603435628]
We propose a new baseline SEA (Split, Encode and Aggregate) for long code search.
It splits long code into code blocks, encodes these blocks into embeddings, and aggregates them to obtain a comprehensive long code representation.
With GraphCodeBERT as the encoder, SEA achieves an overall mean reciprocal ranking score of 0.785, which is 10.1% higher than GraphCodeBERT on the CodeSearchNet benchmark.
arXiv Detail & Related papers (2022-08-24T02:27:30Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - IntelliCode Compose: Code Generation Using Transformer [7.623136583706195]
We introduce IntelliCode Compose $-$ a general-purpose multilingual code completion tool.
It is capable of predicting sequences of code tokens of arbitrary types, generating up to entire lines of syntactically correct code.
IntelliCode Compose is deployed as a cloud-based web service.
arXiv Detail & Related papers (2020-05-16T15:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.