Related papers: Using Large Language Models for Commit Message Generation: A Preliminary Study

Using Large Language Models for Commit Message Generation: A Preliminary Study

URL: http://arxiv.org/abs/2401.05926v2
Date: Sat, 13 Jan 2024 15:14:13 GMT
Title: Using Large Language Models for Commit Message Generation: A Preliminary Study
Authors: Linghao Zhang, Jingshu Zhao, Chong Wang, Peng Liang
Abstract summary: Large language models (LLMs) can be used to generate commit messages automatically and effectively. In 78% of the 366 samples, the commit messages generated by LLMs were evaluated by humans as the best.
Score: 5.5784148764236114
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A commit message is a textual description of the code changes in a commit, which is a key part of the Git version control system (VCS). It captures the essence of software updating. Therefore, it can help developers understand code evolution and facilitate efficient collaboration between developers. However, it is time-consuming and labor-intensive to write good and valuable commit messages. Some researchers have conducted extensive studies on the automatic generation of commit messages and proposed several methods for this purpose, such as generationbased and retrieval-based models. However, seldom studies explored whether large language models (LLMs) can be used to generate commit messages automatically and effectively. To this end, this paper designed and conducted a series of experiments to comprehensively evaluate the performance of popular open-source and closed-source LLMs, i.e., Llama 2 and ChatGPT, in commit message generation. The results indicate that considering the BLEU and Rouge-L metrics, LLMs surpass the existing methods in certain indicators but lag behind in others. After human evaluations, however, LLMs show a distinct advantage over all these existing methods. Especially, in 78% of the 366 samples, the commit messages generated by LLMs were evaluated by humans as the best. This work not only reveals the promising potential of using LLMs to generate commit messages, but also explores the limitations of commonly used metrics in evaluating the quality of auto-generated commit messages.

Related papers

AI-Powered Commit Explorer (APCE) [4.651023094799028]
Large Language Model (LLM) generated commit messages have emerged as a way to mitigate this issue.<n>We introduce the AI-Powered Commit Explorer (APCE), a tool to support developers and researchers in the use and study of LLM-generated commit messages.
arXiv Detail & Related papers (2025-07-21T20:58:56Z)
Automated Generation of Commit Messages in Software Repositories [0.7366405857677226]
Commit messages are crucial for documenting software changes, aiding in program comprehension and maintenance. Our research presents an automated approach to generate commit messages using Machine Learning (ML) and Natural Language Processing (NLP) We used the dataset of code changes and corresponding commit messages that was used by Liu et al.
arXiv Detail & Related papers (2025-04-17T15:08:05Z)
An Empirical Study on Commit Message Generation using LLMs via In-Context Learning [26.39743339039473]
Commit messages concisely describe code changes in natural language. We propose to borrow the weapon of large language models (LLMs) and in-context learning (ICL) to generate commit messages.
arXiv Detail & Related papers (2025-02-26T07:47:52Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs) We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy. We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly. We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments. Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z)
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models [89.28591263741973]
We introduce the Hierarchical Long Text Generation Benchmark (HelloBench) to evaluate Large Language Models' performance in generating long text. Based on Bloom's taxonomy, HelloBench categorizes long text generation tasks into five subtasks: open-ended QA, summarization, chat, text completion, and text generation. Besides, we propose Hierarchical Long Text Evaluation (HelloEval), a human evaluation method that significantly reduces the time and effort required for human evaluation.
arXiv Detail & Related papers (2024-09-24T15:38:11Z)
Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation [4.400274233826898]
Open-source Large Language Models can generate commit messages comparable to those produced by OMG. We propose lOcal MessagE GenerAtor, a CMG approach that uses a 4-bit quantized 8B open-source LLM.
arXiv Detail & Related papers (2024-08-05T14:26:41Z)
Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks [1.3586572110652484]
This study explores the capabilities of Large Language Models (LLMs) in retrieving contextual information from large text documents. Our benchmark, Bug In The Code Stack (BICS), is designed to assess the ability of LLMs to identify simple syntax bugs within large source code. Our findings reveal three key insights: (1) code-based environments pose significantly more challenge compared to text-based environments for retrieval tasks, (2) there is a substantial performance disparity among different models, and (3) there is a notable correlation between longer context lengths and performance degradation.
arXiv Detail & Related papers (2024-06-21T17:37:10Z)
RAG-Enhanced Commit Message Generation [8.858678357308726]
Commit Message Generation has become a research hotspot. It is time-consuming to write commit messages manually. This paper proposes REACT, a REtrieval-Augmented framework for CommiT message generation.
arXiv Detail & Related papers (2024-06-08T16:24:24Z)
Commit Messages in the Age of Large Language Models [0.9217021281095906]
We evaluate the performance of OpenAI's ChatGPT for generating commit messages based on code changes. We compare the results obtained with ChatGPT to previous automatic commit message generation methods that have been trained specifically on commit data.
arXiv Detail & Related papers (2024-01-31T06:47:12Z)
LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs) We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z)
Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z)
From Commit Message Generation to History-Aware Commit Message Completion [49.175498083165884]
We argue that if we could shift the focus from commit message generation to commit message completion, we could significantly improve the quality and the personal nature of the resulting commit messages. Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages. Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages.
arXiv Detail & Related papers (2023-08-15T09:10:49Z)
LLMDet: A Third Party Large Language Models Generated Text Detection Tool [119.0952092533317]
Large language models (LLMs) are remarkably close to high-quality human-authored text. Existing detection tools can only differentiate between machine-generated and human-authored text. We propose LLMDet, a model-specific, secure, efficient, and extendable detection tool.
arXiv Detail & Related papers (2023-05-24T10:45:16Z)
On the Evaluation of Commit Message Generation Models: An Experimental Study [33.19314967188712]
Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. This paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets.
arXiv Detail & Related papers (2021-07-12T12:38:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.