Retrieval Augmented Code Generation and Summarization
- URL: http://arxiv.org/abs/2108.11601v1
- Date: Thu, 26 Aug 2021 06:48:13 GMT
- Title: Retrieval Augmented Code Generation and Summarization
- Authors: Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray,
Kai-Wei Chang
- Abstract summary: We propose a retrieval augmented framework, tool, that retrieves relevant code or summaries from a retrieval database.
tool extends the state-of-the-art dense retrieval technique to search for relevant code or summaries.
We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python.
- Score: 43.823483197436076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software developers write a lot of source code and documentation during
software development. Intrinsically, developers often recall parts of source
code or code summaries that they had written in the past while implementing
software or documenting them. To mimic developers' code or summary generation
behavior, we propose a retrieval augmented framework, \tool, that retrieves
relevant code or summaries from a retrieval database and provides them as a
supplement to code generation or summarization models. \tool has a couple of
uniqueness. First, it extends the state-of-the-art dense retrieval technique to
search for relevant code or summaries. Second, it can work with retrieval
databases that include unimodal (only code or natural language description) or
bimodal instances (code-description pairs). We conduct experiments and
extensive analysis on two benchmark datasets of code generation and
summarization in Java and Python, and the promising results endorse the
effectiveness of our proposed retrieval augmented framework.
Related papers
- Building A Coding Assistant via the Retrieval-Augmented Language Model [24.654428111628242]
We propose a retrieval-augmeNted language model (CONAN) to build a code assistant by mimicking the knowledge-seeking behaviors of humans during coding.
It consists of a code structure aware retriever (CONAN-R) and a dual-view code representation-based retrieval-augmented generation model (CONAN-G)
arXiv Detail & Related papers (2024-10-21T17:34:39Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - Generation-Augmented Query Expansion For Code Retrieval [51.20943646688115]
We propose a generation-augmented query expansion framework.
Inspired by the human retrieval process - sketching an answer before searching.
We achieve new state-of-the-art results on the CodeSearchNet benchmark.
arXiv Detail & Related papers (2022-12-20T23:49:37Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations.
For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z) - AugmentedCode: Examining the Effects of Natural Language Resources in
Code Retrieval Models [5.112140303263898]
We introduce Augmented Code (AugmentedCode) retrieval which takes advantage of existing information within the code.
We showcased the the results of augmented programming language which outperforms on CodeSearchNet and CodeBERT with a Mean Reciprocal Rank (MRR) of 0.73 and 0.96.
arXiv Detail & Related papers (2021-10-16T08:44:48Z) - Leveraging Code Generation to Improve Code Retrieval and Summarization
via Dual Learning [18.354352985591305]
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query.
Recent studies have combined these two tasks to improve their performance.
We propose a novel end-to-end model for the two tasks by introducing an additional code generation task.
arXiv Detail & Related papers (2020-02-24T12:26:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.