CommitBERT: Commit Message Generation Using Pre-Trained Programming
Language Model
- URL: http://arxiv.org/abs/2105.14242v1
- Date: Sat, 29 May 2021 07:48:28 GMT
- Title: CommitBERT: Commit Message Generation Using Pre-Trained Programming
Language Model
- Authors: Tae-Hwan Jung
- Abstract summary: Commit message is a document that summarizes source code changes in natural language.
We develop a model that automatically writes the commit message.
We release 345K datasets consisting of code modification and commit messages in six programming languages.
- Score: 0.38073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commit message is a document that summarizes source code changes in natural
language. A good commit message clearly shows the source code changes, so this
enhances collaboration between developers. Therefore, our work is to develop a
model that automatically writes the commit message.
To this end, we release 345K datasets consisting of code modification and
commit messages in six programming languages (Python, PHP, Go, Java,
JavaScript, and Ruby). Similar to the neural machine translation (NMT) model,
using our dataset, we feed the code modification to the encoder input and the
commit message to the decoder input and measure the result of the generated
commit message with BLEU-4.
Also, we propose the following two training methods to improve the result of
generating the commit message: (1) A method of preprocessing the input to feed
the code modification to the encoder input. (2) A method that uses an initial
weight suitable for the code domain to reduce the gap in contextual
representation between programming language (PL) and natural language (NL).
Training code, dataset, and pre-trained weights are available at
https://github.com/graykode/commit-autosuggestions
Related papers
- Commit Messages in the Age of Large Language Models [0.9217021281095906]
We evaluate the performance of OpenAI's ChatGPT for generating commit messages based on code changes.
We compare the results obtained with ChatGPT to previous automatic commit message generation methods that have been trained specifically on commit data.
arXiv Detail & Related papers (2024-01-31T06:47:12Z) - Delving into Commit-Issue Correlation to Enhance Commit Message
Generation Models [13.605167159285374]
Commit message generation is a challenging task in automated software engineering.
tool is a novel paradigm that can introduce the correlation between commits and issues into the training phase of models.
The results show that compared with the original models, the performance of tool-enhanced models is significantly improved.
arXiv Detail & Related papers (2023-07-31T20:35:00Z) - Context-Encoded Code Change Representation for Automated Commit Message
Generation [0.0]
This paper proposes a method to represent code changes by combining the changed code and the unchanged code.
It overcomes the limitations of current representations while improving the performance of 5/6 of state-of-the-art commit message generation methods.
arXiv Detail & Related papers (2023-06-26T04:48:14Z) - Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same.
Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks.
In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - ECMG: Exemplar-based Commit Message Generation [45.54414179533286]
Commit messages concisely describe the content of code diffs (i.e., code changes) and the intent behind them.
The information retrieval-based methods reuse the commit messages of similar code diffs, while the neural-based methods learn the semantic connection between code diffs and commit messages.
We propose a novel exemplar-based neural commit message generation model, which treats the similar commit message as an exemplar and leverages it to guide the neural network model to generate an accurate commit message.
arXiv Detail & Related papers (2022-03-05T10:55:15Z) - Jointly Learning to Repair Code and Generate Commit Message [78.4177637346384]
We construct a multilingual triple dataset including buggy code, fixed code, and commit messages for this novel task.
To deal with the error propagation problem of the cascaded method, the joint model is proposed that can both repair the code and generate the commit message.
Experimental results show that the enhanced cascaded model with teacher-student method and multitask-learning method achieves the best score on different metrics of automated code repair.
arXiv Detail & Related papers (2021-09-25T07:08:28Z) - CoreGen: Contextualized Code Representation Learning for Commit Message
Generation [39.383390029545865]
We propose a novel Contextualized code representation learning strategy for commit message Generation (CoreGen)
Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with at least 28.18% improvement in terms of BLEU-4 score.
arXiv Detail & Related papers (2020-07-14T09:43:26Z) - Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics.
We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z) - Incorporating External Knowledge through Pre-training for Natural
Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents.
We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation.
Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.