Related papers: From Commit Message Generation to History-Aware Commit Message Completion

From Commit Message Generation to History-Aware Commit Message Completion

URL: http://arxiv.org/abs/2308.07655v1
Date: Tue, 15 Aug 2023 09:10:49 GMT
Title: From Commit Message Generation to History-Aware Commit Message Completion
Authors: Aleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov, Yaroslav Golubev, Danny Dig, Timofey Bryksin
Abstract summary: We argue that if we could shift the focus from commit message generation to commit message completion, we could significantly improve the quality and the personal nature of the resulting commit messages. Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages. Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages.
Score: 49.175498083165884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Commit messages are crucial to software development, allowing developers to track changes and collaborate effectively. Despite their utility, most commit messages lack important information since writing high-quality commit messages is tedious and time-consuming. The active research on commit message generation (CMG) has not yet led to wide adoption in practice. We argue that if we could shift the focus from commit message generation to commit message completion and use previous commit history as additional context, we could significantly improve the quality and the personal nature of the resulting commit messages. In this paper, we propose and evaluate both of these novel ideas. Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages. We use this dataset to evaluate the completion setting and the usefulness of the historical context for state-of-the-art CMG models and GPT-3.5-turbo. Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages. As for the history, the results show that historical information improves the performance of CMG models in the generation task, and the performance of GPT-3.5-turbo in both generation and completion.

Related papers

Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly. We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments. Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z)
RAG-Enhanced Commit Message Generation [8.858678357308726]
Commit Message Generation has become a research hotspot. It is time-consuming to write commit messages manually. This paper proposes REACT, a REtrieval-Augmented framework for CommiT message generation.
arXiv Detail & Related papers (2024-06-08T16:24:24Z)
COMET: Generating Commit Messages using Delta Graph Context Representation [2.5899040911480182]
Commit messages explain code changes in a commit and facilitate collaboration among developers. We propose Comet, a novel approach that captures context of code changes using a graph-based representation. Tests show Comet outperforms state-of-the-art techniques in terms of bleu-norm and meteor metrics.
arXiv Detail & Related papers (2024-02-02T19:01:52Z)
Commit Messages in the Age of Large Language Models [0.9217021281095906]
We evaluate the performance of OpenAI's ChatGPT for generating commit messages based on code changes. We compare the results obtained with ChatGPT to previous automatic commit message generation methods that have been trained specifically on commit data.
arXiv Detail & Related papers (2024-01-31T06:47:12Z)
Using Large Language Models for Commit Message Generation: A Preliminary Study [5.5784148764236114]
Large language models (LLMs) can be used to generate commit messages automatically and effectively. In 78% of the 366 samples, the commit messages generated by LLMs were evaluated by humans as the best.
arXiv Detail & Related papers (2024-01-11T14:06:39Z)
Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models [13.605167159285374]
Commit message generation is a challenging task in automated software engineering. tool is a novel paradigm that can introduce the correlation between commits and issues into the training phase of models. The results show that compared with the original models, the performance of tool-enhanced models is significantly improved.
arXiv Detail & Related papers (2023-07-31T20:35:00Z)
Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task. The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages. Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
Learning to Transfer Prompts for Text Generation [97.64625999380425]
We propose a novel prompt-based method (PTG) for text generation in a transferable setting. First, PTG learns a set of source prompts for various source generation tasks and then transfers these prompts as target prompts to perform target generation tasks. In extensive experiments, PTG yields competitive or better results than fine-tuning methods.
arXiv Detail & Related papers (2022-05-03T14:53:48Z)
CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities [92.79451009324268]
We present CoAuthor, a dataset designed for revealing GPT-3's capabilities in assisting creative and argumentative writing. We demonstrate that CoAuthor can address questions about GPT-3's language, ideation, and collaboration capabilities. We discuss how this work may facilitate a more principled discussion around LMs' promises and pitfalls in relation to interaction design.
arXiv Detail & Related papers (2022-01-18T07:51:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.