From Commit Message Generation to History-Aware Commit Message
Completion
- URL: http://arxiv.org/abs/2308.07655v1
- Date: Tue, 15 Aug 2023 09:10:49 GMT
- Title: From Commit Message Generation to History-Aware Commit Message
Completion
- Authors: Aleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov, Yaroslav
Golubev, Danny Dig, Timofey Bryksin
- Abstract summary: We argue that if we could shift the focus from commit message generation to commit message completion, we could significantly improve the quality and the personal nature of the resulting commit messages.
Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages.
Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages.
- Score: 49.175498083165884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commit messages are crucial to software development, allowing developers to
track changes and collaborate effectively. Despite their utility, most commit
messages lack important information since writing high-quality commit messages
is tedious and time-consuming. The active research on commit message generation
(CMG) has not yet led to wide adoption in practice. We argue that if we could
shift the focus from commit message generation to commit message completion and
use previous commit history as additional context, we could significantly
improve the quality and the personal nature of the resulting commit messages.
In this paper, we propose and evaluate both of these novel ideas. Since the
existing datasets lack historical data, we collect and share a novel dataset
called CommitChronicle, containing 10.7M commits across 20 programming
languages. We use this dataset to evaluate the completion setting and the
usefulness of the historical context for state-of-the-art CMG models and
GPT-3.5-turbo. Our results show that in some contexts, commit message
completion shows better results than generation, and that while in general
GPT-3.5-turbo performs worse, it shows potential for long and detailed
messages. As for the history, the results show that historical information
improves the performance of CMG models in the generation task, and the
performance of GPT-3.5-turbo in both generation and completion.
Related papers
- Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - RAG-Enhanced Commit Message Generation [8.858678357308726]
Commit Message Generation has become a research hotspot.
It is time-consuming to write commit messages manually.
This paper proposes REACT, a REtrieval-Augmented framework for CommiT message generation.
arXiv Detail & Related papers (2024-06-08T16:24:24Z) - COMET: Generating Commit Messages using Delta Graph Context
Representation [2.5899040911480182]
Commit messages explain code changes in a commit and facilitate collaboration among developers.
We propose Comet, a novel approach that captures context of code changes using a graph-based representation.
Tests show Comet outperforms state-of-the-art techniques in terms of bleu-norm and meteor metrics.
arXiv Detail & Related papers (2024-02-02T19:01:52Z) - Commit Messages in the Age of Large Language Models [0.9217021281095906]
We evaluate the performance of OpenAI's ChatGPT for generating commit messages based on code changes.
We compare the results obtained with ChatGPT to previous automatic commit message generation methods that have been trained specifically on commit data.
arXiv Detail & Related papers (2024-01-31T06:47:12Z) - Using Large Language Models for Commit Message Generation: A Preliminary
Study [5.5784148764236114]
Large language models (LLMs) can be used to generate commit messages automatically and effectively.
In 78% of the 366 samples, the commit messages generated by LLMs were evaluated by humans as the best.
arXiv Detail & Related papers (2024-01-11T14:06:39Z) - Delving into Commit-Issue Correlation to Enhance Commit Message
Generation Models [13.605167159285374]
Commit message generation is a challenging task in automated software engineering.
tool is a novel paradigm that can introduce the correlation between commits and issues into the training phase of models.
The results show that compared with the original models, the performance of tool-enhanced models is significantly improved.
arXiv Detail & Related papers (2023-07-31T20:35:00Z) - Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task.
The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.
Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - Learning to Transfer Prompts for Text Generation [97.64625999380425]
We propose a novel prompt-based method (PTG) for text generation in a transferable setting.
First, PTG learns a set of source prompts for various source generation tasks and then transfers these prompts as target prompts to perform target generation tasks.
In extensive experiments, PTG yields competitive or better results than fine-tuning methods.
arXiv Detail & Related papers (2022-05-03T14:53:48Z) - CoAuthor: Designing a Human-AI Collaborative Writing Dataset for
Exploring Language Model Capabilities [92.79451009324268]
We present CoAuthor, a dataset designed for revealing GPT-3's capabilities in assisting creative and argumentative writing.
We demonstrate that CoAuthor can address questions about GPT-3's language, ideation, and collaboration capabilities.
We discuss how this work may facilitate a more principled discussion around LMs' promises and pitfalls in relation to interaction design.
arXiv Detail & Related papers (2022-01-18T07:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.