Related papers: CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back

CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back

URL: http://arxiv.org/abs/2302.03924v1
Date: Wed, 8 Feb 2023 07:43:55 GMT
Title: CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back
Authors: Zhongxin Liu, Zhijie Tang, Xin Xia, Xiaohu Yang
Abstract summary: This work proposes a novel Code Change Representation learning approach named CCRep. CCRep learns to encode code changes as feature vectors for diverse downstream tasks. We apply CCRep to three tasks: commit message generation, patch correctness assessment, and just-in-time defect prediction.
Score: 8.721077261941236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Representing code changes as numeric feature vectors, i.e., code change representations, is usually an essential step to automate many software engineering tasks related to code changes, e.g., commit message generation and just-in-time defect prediction. Intuitively, the quality of code change representations is crucial for the effectiveness of automated approaches. Prior work on code changes usually designs and evaluates code change representation approaches for a specific task, and little work has investigated code change encoders that can be used and jointly trained on various tasks. To fill this gap, this work proposes a novel Code Change Representation learning approach named CCRep, which can learn to encode code changes as feature vectors for diverse downstream tasks. Specifically, CCRep regards a code change as the combination of its before-change and after-change code, leverages a pre-trained code model to obtain high-quality contextual embeddings of code, and uses a novel mechanism named query back to extract and encode the changed code fragments and make them explicitly interact with the whole code change. To evaluate CCRep and demonstrate its applicability to diverse code-change-related tasks, we apply it to three tasks: commit message generation, patch correctness assessment, and just-in-time defect prediction. Experimental results show that CCRep outperforms the state-of-the-art techniques on each task.

Related papers

ChangeGuard: Validating Code Changes via Pairwise Learning-Guided Execution [16.130469984234956]
ChangeGuard is an approach that uses learning-guided execution to compare the runtime behavior of a modified function. Our results show that the approach identifies semantics-changing code changes with a precision of 77.1% and a recall of 69.5%.
arXiv Detail & Related papers (2024-10-21T15:13:32Z)
ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization [21.886950861445122]
Code summarization aims to automatically generate succinct natural language summaries for given code snippets. This paper proposes a novel approach to improve code summarization based on summary-focused tasks.
arXiv Detail & Related papers (2024-07-01T03:06:51Z)
CCBERT: Self-Supervised Code Change Representation Learning [14.097775709587475]
CCBERT is a new Transformer-based pre-trained model that learns a generic representation of code changes based on a large-scale dataset containing massive unlabeled code changes. Our experiments demonstrate that CCBERT significantly outperforms CC2Vec or the state-of-the-art approaches of the downstream tasks by 7.7%--14.0% in terms of different metrics and tasks.
arXiv Detail & Related papers (2023-09-27T08:17:03Z)
Automated Code Editing with Search-Generate-Modify [24.96672652375192]
This paper proposes a hybrid approach to better synthesize code edits by leveraging the power of code search, generation, and modification. SARGAM is a novel tool designed to mimic a real developer's code editing behavior.
arXiv Detail & Related papers (2023-06-10T17:11:21Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same. Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks. In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z)
Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm. We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language. We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree. We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z)
Unsupervised Learning of General-Purpose Embeddings for Code Changes [6.652641137999891]
We propose an approach for obtaining embeddings of code changes during pre-training. We evaluate them on two different downstream tasks - applying changes to code and commit message generation. Our model outperforms the model that uses full edit sequences by 5.9 percentage points in accuracy.
arXiv Detail & Related papers (2021-06-03T19:08:53Z)
Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics. We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.