CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file
Context
- URL: http://arxiv.org/abs/2212.10007v2
- Date: Wed, 24 May 2023 06:56:45 GMT
- Title: CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file
Context
- Authors: Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Murali Krishna
Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang
- Abstract summary: We propose a framework that incorporates cross-file context to learn the in-file and cross-file context jointly on top of pretrained code LMs.
CoCoMIC successfully improves the existing code LM with a 33.94% relative increase in exact match and a 28.69% relative increase in identifier matching for code completion when the cross-file context is provided.
- Score: 82.88371379927112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While pre-trained language models (LM) for code have achieved great success
in code completion, they generate code conditioned only on the contents within
the file, i.e., in-file context, but ignore the rich semantics in other files
within the same project, i.e., cross-file context, a critical source of
information that is especially useful in modern modular software development.
Such overlooking constrains code language models' capacity in code completion,
leading to unexpected behaviors such as generating hallucinated class member
functions or function calls with unexpected arguments. In this work, we develop
a cross-file context finder tool, CCFINDER, that effectively locates and
retrieves the most relevant cross-file context. We propose CoCoMIC, a framework
that incorporates cross-file context to learn the in-file and cross-file
context jointly on top of pretrained code LMs. CoCoMIC successfully improves
the existing code LM with a 33.94% relative increase in exact match and a
28.69% relative increase in identifier matching for code completion when the
cross-file context is provided.
Related papers
- Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs [24.00351065427465]
We propose a strategy named Hierarchical Context Pruning (HCP) to construct completion prompts with high informational code content.
The HCP models the code repository at the function level, maintaining the topological dependencies between code files while removing a large amount of irrelevant code content.
arXiv Detail & Related papers (2024-06-26T12:26:16Z) - Enhancing LLM-Based Coding Tools through Native Integration of
IDE-Derived Static Context [41.91246546266515]
We argue that Integrated Development Environments (IDEs) can provide direct, accurate and real-time cross-file information for repository-level code completion.
We propose a framework that leveragesIDE native static contexts for cross-context construction and diagnosis results for self-refinement.
arXiv Detail & Related papers (2024-02-06T01:59:41Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code
Completion [86.01508183157613]
CrossCodeEval is built on a diverse set of real-world, open-sourced, permissively-licensed repositories in four popular programming languages.
We show that CrossCodeEval is extremely challenging when the relevant cross-file context is absent.
We also show that CrossCodeEval can also be used to measure the capability of code retrievers.
arXiv Detail & Related papers (2023-10-17T13:18:01Z) - InterCode: Standardizing and Benchmarking Interactive Coding with
Execution Feedback [50.725076393314964]
We introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning environment.
Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution.
We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies.
arXiv Detail & Related papers (2023-06-26T17:59:50Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations.
For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z) - Long-Range Modeling of Source Code Files with eWASH: Extended Window
Access by Syntax Hierarchy [30.368963500809365]
We introduce an architecture-independent approach for leveraging entire file-level context into a fixed-length window.
We evaluate this approach on code generation tasks and joint translation of natural language and source code in Python programming language.
arXiv Detail & Related papers (2021-09-17T23:11:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.