CCRep: Learning Code Change Representations via Pre-Trained Code Model
and Query Back
- URL: http://arxiv.org/abs/2302.03924v1
- Date: Wed, 8 Feb 2023 07:43:55 GMT
- Title: CCRep: Learning Code Change Representations via Pre-Trained Code Model
and Query Back
- Authors: Zhongxin Liu, Zhijie Tang, Xin Xia, Xiaohu Yang
- Abstract summary: This work proposes a novel Code Change Representation learning approach named CCRep.
CCRep learns to encode code changes as feature vectors for diverse downstream tasks.
We apply CCRep to three tasks: commit message generation, patch correctness assessment, and just-in-time defect prediction.
- Score: 8.721077261941236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Representing code changes as numeric feature vectors, i.e., code change
representations, is usually an essential step to automate many software
engineering tasks related to code changes, e.g., commit message generation and
just-in-time defect prediction. Intuitively, the quality of code change
representations is crucial for the effectiveness of automated approaches. Prior
work on code changes usually designs and evaluates code change representation
approaches for a specific task, and little work has investigated code change
encoders that can be used and jointly trained on various tasks. To fill this
gap, this work proposes a novel Code Change Representation learning approach
named CCRep, which can learn to encode code changes as feature vectors for
diverse downstream tasks. Specifically, CCRep regards a code change as the
combination of its before-change and after-change code, leverages a pre-trained
code model to obtain high-quality contextual embeddings of code, and uses a
novel mechanism named query back to extract and encode the changed code
fragments and make them explicitly interact with the whole code change. To
evaluate CCRep and demonstrate its applicability to diverse code-change-related
tasks, we apply it to three tasks: commit message generation, patch correctness
assessment, and just-in-time defect prediction. Experimental results show that
CCRep outperforms the state-of-the-art techniques on each task.
Related papers
- ChangeGuard: Validating Code Changes via Pairwise Learning-Guided Execution [16.130469984234956]
ChangeGuard is an approach that uses learning-guided execution to compare the runtime behavior of a modified function.
Our results show that the approach identifies semantics-changing code changes with a precision of 77.1% and a recall of 69.5%.
arXiv Detail & Related papers (2024-10-21T15:13:32Z) - ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization [21.886950861445122]
Code summarization aims to automatically generate succinct natural language summaries for given code snippets.
This paper proposes a novel approach to improve code summarization based on summary-focused tasks.
arXiv Detail & Related papers (2024-07-01T03:06:51Z) - CCBERT: Self-Supervised Code Change Representation Learning [14.097775709587475]
CCBERT is a new Transformer-based pre-trained model that learns a generic representation of code changes based on a large-scale dataset containing massive unlabeled code changes.
Our experiments demonstrate that CCBERT significantly outperforms CC2Vec or the state-of-the-art approaches of the downstream tasks by 7.7%--14.0% in terms of different metrics and tasks.
arXiv Detail & Related papers (2023-09-27T08:17:03Z) - Automated Code Editing with Search-Generate-Modify [24.96672652375192]
This paper proposes a hybrid approach to better synthesize code edits by leveraging the power of code search, generation, and modification.
SARGAM is a novel tool designed to mimic a real developer's code editing behavior.
arXiv Detail & Related papers (2023-06-10T17:11:21Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same.
Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks.
In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - Unsupervised Learning of General-Purpose Embeddings for Code Changes [6.652641137999891]
We propose an approach for obtaining embeddings of code changes during pre-training.
We evaluate them on two different downstream tasks - applying changes to code and commit message generation.
Our model outperforms the model that uses full edit sequences by 5.9 percentage points in accuracy.
arXiv Detail & Related papers (2021-06-03T19:08:53Z) - Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics.
We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.