Context-Encoded Code Change Representation for Automated Commit Message
Generation
- URL: http://arxiv.org/abs/2306.14418v1
- Date: Mon, 26 Jun 2023 04:48:14 GMT
- Title: Context-Encoded Code Change Representation for Automated Commit Message
Generation
- Authors: Thanh Trong Vu, Thanh-Dat Do, and Hieu Dinh Vo
- Abstract summary: This paper proposes a method to represent code changes by combining the changed code and the unchanged code.
It overcomes the limitations of current representations while improving the performance of 5/6 of state-of-the-art commit message generation methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Changes in source code are an inevitable part of software development. They
are the results of indispensable activities such as fixing bugs or improving
functionality. Descriptions for code changes (commit messages) help people
better understand the changes. However, due to a lack of motivation and time
pressure, writing high-quality commit messages remains reluctantly considered.
Several methods have been proposed with the aim of automated commit message
generation.
However, the existing methods are still limited because they only utilise
either the changed code or the changed code combined with surrounding
statements.
This paper proposes a method to represent code changes by combining the
changed code and the unchanged code which have program dependence on the
changed code. This method overcomes the limitations of current representations
while improving the performance of 5/6 of state-of-the-art commit message
generation methods by up to 15% in METEOR, 14% in ROUGE-L, and 10% in BLEU-4.
Related papers
- Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - ChangeGuard: Validating Code Changes via Pairwise Learning-Guided Execution [16.130469984234956]
ChangeGuard is an approach that uses learning-guided execution to compare the runtime behavior of a modified function.
Our results show that the approach identifies semantics-changing code changes with a precision of 77.1% and a recall of 69.5%.
arXiv Detail & Related papers (2024-10-21T15:13:32Z) - CodeUpdateArena: Benchmarking Knowledge Editing on API Updates [77.81663273436375]
We present CodeUpdateArena, a benchmark for knowledge editing in the code domain.
An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example.
Our benchmark covers updates of various types to 54 functions from seven diverse Python packages.
arXiv Detail & Related papers (2024-07-08T17:55:04Z) - Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same.
Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks.
In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z) - CCRep: Learning Code Change Representations via Pre-Trained Code Model
and Query Back [8.721077261941236]
This work proposes a novel Code Change Representation learning approach named CCRep.
CCRep learns to encode code changes as feature vectors for diverse downstream tasks.
We apply CCRep to three tasks: commit message generation, patch correctness assessment, and just-in-time defect prediction.
arXiv Detail & Related papers (2023-02-08T07:43:55Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Jointly Learning to Repair Code and Generate Commit Message [78.4177637346384]
We construct a multilingual triple dataset including buggy code, fixed code, and commit messages for this novel task.
To deal with the error propagation problem of the cascaded method, the joint model is proposed that can both repair the code and generate the commit message.
Experimental results show that the enhanced cascaded model with teacher-student method and multitask-learning method achieves the best score on different metrics of automated code repair.
arXiv Detail & Related papers (2021-09-25T07:08:28Z) - Unsupervised Learning of General-Purpose Embeddings for Code Changes [6.652641137999891]
We propose an approach for obtaining embeddings of code changes during pre-training.
We evaluate them on two different downstream tasks - applying changes to code and commit message generation.
Our model outperforms the model that uses full edit sequences by 5.9 percentage points in accuracy.
arXiv Detail & Related papers (2021-06-03T19:08:53Z) - CommitBERT: Commit Message Generation Using Pre-Trained Programming
Language Model [0.38073142980733]
Commit message is a document that summarizes source code changes in natural language.
We develop a model that automatically writes the commit message.
We release 345K datasets consisting of code modification and commit messages in six programming languages.
arXiv Detail & Related papers (2021-05-29T07:48:28Z) - CoreGen: Contextualized Code Representation Learning for Commit Message
Generation [39.383390029545865]
We propose a novel Contextualized code representation learning strategy for commit message Generation (CoreGen)
Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with at least 28.18% improvement in terms of BLEU-4 score.
arXiv Detail & Related papers (2020-07-14T09:43:26Z) - Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics.
We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.