Related papers: Brevity is the Soul of Wit: Condensing Code Changes to Improve Commit Message Generation

Brevity is the Soul of Wit: Condensing Code Changes to Improve Commit Message Generation

URL: http://arxiv.org/abs/2509.15567v1
Date: Fri, 19 Sep 2025 04:04:28 GMT
Title: Brevity is the Soul of Wit: Condensing Code Changes to Improve Commit Message Generation
Authors: Hongyu Kuang, Ning Zhang, Hui Gao, Xin Zhou, Wesley K. G. Assunção, Xiaoxing Ma, Dong Shao, Guoping Rong, He Zhang,
Abstract summary: We propose an alternative way to condense code changes before generation.<n>We first condense code changes by using our proposed templates with the help of a tool named ChangeScribe.<n>Our approach can outperform six baselines in terms of BLEU-Norm, METEOR, and ROUGE-L.
Score: 21.625755841132733
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Commit messages are valuable resources for describing why code changes are committed to repositories in version control systems (e.g., Git). They effectively help developers understand code changes and better perform software maintenance tasks. Unfortunately, developers often neglect to write high-quality commit messages in practice. Therefore, a growing body of work is proposed to generate commit messages automatically. These works all demonstrated that how to organize and represent code changes is vital in generating good commit messages, including the use of fine-grained graphs or embeddings to better represent code changes. In this study, we choose an alternative way to condense code changes before generation, i.e., proposing brief yet concise text templates consisting of the following three parts: (1) summarized code changes, (2) elicited comments, and (3) emphasized code identifiers. Specifically, we first condense code changes by using our proposed templates with the help of a heuristic-based tool named ChangeScribe, and then fine-tune CodeLlama-7B on the pairs of our proposed templates and corresponding commit messages. Our proposed templates better utilize pre-trained language models, while being naturally brief and readable to complement generated commit messages for developers. Our evaluation based on a widely used dataset showed that our approach can outperform six baselines in terms of BLEU-Norm, METEOR, and ROUGE-L, with average improvements of 51.7%, 78.7%, and 62.5%, respectively. The ablation study and human evaluation also provide further insights into the effectiveness of our approach.

Related papers

Contextual Code Retrieval for Commit Message Generation: A Preliminary Study [18.46986692375691]
A commit message describes the main code changes in a commit and plays a crucial role in software maintenance.<n>Existing commit message generation approaches typically frame it as a direct mapping which inputs a code diff and produces a brief descriptive sentence as output.<n>We argue that relying solely on the code diff is insufficient, as raw code diff fails to capture the full context needed for generating high-quality commit messages.
arXiv Detail & Related papers (2025-07-23T16:54:57Z)
Commit Messages in the Age of Large Language Models [0.9217021281095906]
We evaluate the performance of OpenAI's ChatGPT for generating commit messages based on code changes. We compare the results obtained with ChatGPT to previous automatic commit message generation methods that have been trained specifically on commit data.
arXiv Detail & Related papers (2024-01-31T06:47:12Z)
From Commit Message Generation to History-Aware Commit Message Completion [49.175498083165884]
We argue that if we could shift the focus from commit message generation to commit message completion, we could significantly improve the quality and the personal nature of the resulting commit messages. Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages. Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages.
arXiv Detail & Related papers (2023-08-15T09:10:49Z)
Context-Encoded Code Change Representation for Automated Commit Message Generation [0.0]
This paper proposes a method to represent code changes by combining the changed code and the unchanged code. It overcomes the limitations of current representations while improving the performance of 5/6 of state-of-the-art commit message generation methods.
arXiv Detail & Related papers (2023-06-26T04:48:14Z)
Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same. Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks. In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
Jointly Learning to Repair Code and Generate Commit Message [78.4177637346384]
We construct a multilingual triple dataset including buggy code, fixed code, and commit messages for this novel task. To deal with the error propagation problem of the cascaded method, the joint model is proposed that can both repair the code and generate the commit message. Experimental results show that the enhanced cascaded model with teacher-student method and multitask-learning method achieves the best score on different metrics of automated code repair.
arXiv Detail & Related papers (2021-09-25T07:08:28Z)
CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model [0.38073142980733]
Commit message is a document that summarizes source code changes in natural language. We develop a model that automatically writes the commit message. We release 345K datasets consisting of code modification and commit messages in six programming languages.
arXiv Detail & Related papers (2021-05-29T07:48:28Z)
GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
CoreGen: Contextualized Code Representation Learning for Commit Message Generation [39.383390029545865]
We propose a novel Contextualized code representation learning strategy for commit message Generation (CoreGen) Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with at least 28.18% improvement in terms of BLEU-4 score.
arXiv Detail & Related papers (2020-07-14T09:43:26Z)
Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics. We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.