Using Large Language Models for Commit Message Generation: A Preliminary
Study
- URL: http://arxiv.org/abs/2401.05926v2
- Date: Sat, 13 Jan 2024 15:14:13 GMT
- Title: Using Large Language Models for Commit Message Generation: A Preliminary
Study
- Authors: Linghao Zhang, Jingshu Zhao, Chong Wang, Peng Liang
- Abstract summary: Large language models (LLMs) can be used to generate commit messages automatically and effectively.
In 78% of the 366 samples, the commit messages generated by LLMs were evaluated by humans as the best.
- Score: 5.5784148764236114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A commit message is a textual description of the code changes in a commit,
which is a key part of the Git version control system (VCS). It captures the
essence of software updating. Therefore, it can help developers understand code
evolution and facilitate efficient collaboration between developers. However,
it is time-consuming and labor-intensive to write good and valuable commit
messages. Some researchers have conducted extensive studies on the automatic
generation of commit messages and proposed several methods for this purpose,
such as generationbased and retrieval-based models. However, seldom studies
explored whether large language models (LLMs) can be used to generate commit
messages automatically and effectively. To this end, this paper designed and
conducted a series of experiments to comprehensively evaluate the performance
of popular open-source and closed-source LLMs, i.e., Llama 2 and ChatGPT, in
commit message generation. The results indicate that considering the BLEU and
Rouge-L metrics, LLMs surpass the existing methods in certain indicators but
lag behind in others. After human evaluations, however, LLMs show a distinct
advantage over all these existing methods. Especially, in 78% of the 366
samples, the commit messages generated by LLMs were evaluated by humans as the
best. This work not only reveals the promising potential of using LLMs to
generate commit messages, but also explores the limitations of commonly used
metrics in evaluating the quality of auto-generated commit messages.
Related papers
- Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models [89.28591263741973]
We introduce the Hierarchical Long Text Generation Benchmark (HelloBench) to evaluate Large Language Models' performance in generating long text.
Based on Bloom's taxonomy, HelloBench categorizes long text generation tasks into five subtasks: open-ended QA, summarization, chat, text completion, and text generation.
Besides, we propose Hierarchical Long Text Evaluation (HelloEval), a human evaluation method that significantly reduces the time and effort required for human evaluation.
arXiv Detail & Related papers (2024-09-24T15:38:11Z) - Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation [4.400274233826898]
Open-source Large Language Models can generate commit messages comparable to those produced by OMG.
We propose lOcal MessagE GenerAtor, a CMG approach that uses a 4-bit quantized 8B open-source LLM.
arXiv Detail & Related papers (2024-08-05T14:26:41Z) - Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks [1.3586572110652484]
This study explores the capabilities of Large Language Models (LLMs) in retrieving contextual information from large text documents.
Our benchmark, Bug In The Code Stack (BICS), is designed to assess the ability of LLMs to identify simple syntax bugs within large source code.
Our findings reveal three key insights: (1) code-based environments pose significantly more challenge compared to text-based environments for retrieval tasks, (2) there is a substantial performance disparity among different models, and (3) there is a notable correlation between longer context lengths and performance degradation.
arXiv Detail & Related papers (2024-06-21T17:37:10Z) - RAG-Enhanced Commit Message Generation [8.858678357308726]
Commit Message Generation has become a research hotspot.
It is time-consuming to write commit messages manually.
This paper proposes REACT, a REtrieval-Augmented framework for CommiT message generation.
arXiv Detail & Related papers (2024-06-08T16:24:24Z) - Commit Messages in the Age of Large Language Models [0.9217021281095906]
We evaluate the performance of OpenAI's ChatGPT for generating commit messages based on code changes.
We compare the results obtained with ChatGPT to previous automatic commit message generation methods that have been trained specifically on commit data.
arXiv Detail & Related papers (2024-01-31T06:47:12Z) - LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs)
We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python.
It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z) - Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation.
We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z) - From Commit Message Generation to History-Aware Commit Message
Completion [49.175498083165884]
We argue that if we could shift the focus from commit message generation to commit message completion, we could significantly improve the quality and the personal nature of the resulting commit messages.
Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages.
Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages.
arXiv Detail & Related papers (2023-08-15T09:10:49Z) - LLMDet: A Third Party Large Language Models Generated Text Detection
Tool [119.0952092533317]
Large language models (LLMs) are remarkably close to high-quality human-authored text.
Existing detection tools can only differentiate between machine-generated and human-authored text.
We propose LLMDet, a model-specific, secure, efficient, and extendable detection tool.
arXiv Detail & Related papers (2023-05-24T10:45:16Z) - On the Evaluation of Commit Message Generation Models: An Experimental
Study [33.19314967188712]
Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance.
Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages.
This paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets.
arXiv Detail & Related papers (2021-07-12T12:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.