Related papers: Commit Messages in the Age of Large Language Models

Commit Messages in the Age of Large Language Models

URL: http://arxiv.org/abs/2401.17622v2
Date: Fri, 2 Feb 2024 00:44:32 GMT
Title: Commit Messages in the Age of Large Language Models
Authors: Cristina V. Lopes, Vanessa I. Klotzman, Iris Ma, Iftekar Ahmed
Abstract summary: We evaluate the performance of OpenAI's ChatGPT for generating commit messages based on code changes. We compare the results obtained with ChatGPT to previous automatic commit message generation methods that have been trained specifically on commit data.
Score: 0.9217021281095906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Commit messages are explanations of changes made to a codebase that are stored in version control systems. They help developers understand the codebase as it evolves. However, writing commit messages can be tedious and inconsistent among developers. To address this issue, researchers have tried using different methods to automatically generate commit messages, including rule-based, retrieval-based, and learning-based approaches. Advances in large language models offer new possibilities for generating commit messages. In this study, we evaluate the performance of OpenAI's ChatGPT for generating commit messages based on code changes. We compare the results obtained with ChatGPT to previous automatic commit message generation methods that have been trained specifically on commit data. Our goal is to assess the extent to which large pre-trained language models can generate commit messages that are both quantitatively and qualitatively acceptable. We found that ChatGPT was able to outperform previous Automatic Commit Message Generation (ACMG) methods by orders of magnitude, and that, generally, the messages it generates are both accurate and of high-quality. We also provide insights, and a categorization, for the cases where it fails.

Related papers

Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
Remote Timing Attacks on Efficient Language Model Inference [63.79839291641793]
We show it is possible to exploit timing differences to mount a timing attack. We show how it is possible to learn the topic of a user's conversation with 90%+ precision. An adversary can leverage a boosting attack to recover PII placed in messages for open source systems.
arXiv Detail & Related papers (2024-10-22T16:51:36Z)
Using Large Language Models for Commit Message Generation: A Preliminary Study [5.5784148764236114]
Large language models (LLMs) can be used to generate commit messages automatically and effectively. In 78% of the 366 samples, the commit messages generated by LLMs were evaluated by humans as the best.
arXiv Detail & Related papers (2024-01-11T14:06:39Z)
From Commit Message Generation to History-Aware Commit Message Completion [49.175498083165884]
We argue that if we could shift the focus from commit message generation to commit message completion, we could significantly improve the quality and the personal nature of the resulting commit messages. Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages. Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages.
arXiv Detail & Related papers (2023-08-15T09:10:49Z)
Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models [13.605167159285374]
Commit message generation is a challenging task in automated software engineering. tool is a novel paradigm that can introduce the correlation between commits and issues into the training phase of models. The results show that compared with the original models, the performance of tool-enhanced models is significantly improved.
arXiv Detail & Related papers (2023-07-31T20:35:00Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
Learning to Transfer Prompts for Text Generation [97.64625999380425]
We propose a novel prompt-based method (PTG) for text generation in a transferable setting. First, PTG learns a set of source prompts for various source generation tasks and then transfers these prompts as target prompts to perform target generation tasks. In extensive experiments, PTG yields competitive or better results than fine-tuning methods.
arXiv Detail & Related papers (2022-05-03T14:53:48Z)
ECMG: Exemplar-based Commit Message Generation [45.54414179533286]
Commit messages concisely describe the content of code diffs (i.e., code changes) and the intent behind them. The information retrieval-based methods reuse the commit messages of similar code diffs, while the neural-based methods learn the semantic connection between code diffs and commit messages. We propose a novel exemplar-based neural commit message generation model, which treats the similar commit message as an exemplar and leverages it to guide the neural network model to generate an accurate commit message.
arXiv Detail & Related papers (2022-03-05T10:55:15Z)
Jointly Learning to Repair Code and Generate Commit Message [78.4177637346384]
We construct a multilingual triple dataset including buggy code, fixed code, and commit messages for this novel task. To deal with the error propagation problem of the cascaded method, the joint model is proposed that can both repair the code and generate the commit message. Experimental results show that the enhanced cascaded model with teacher-student method and multitask-learning method achieves the best score on different metrics of automated code repair.
arXiv Detail & Related papers (2021-09-25T07:08:28Z)
CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model [0.38073142980733]
Commit message is a document that summarizes source code changes in natural language. We develop a model that automatically writes the commit message. We release 345K datasets consisting of code modification and commit messages in six programming languages.
arXiv Detail & Related papers (2021-05-29T07:48:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.