Jointly Learning to Repair Code and Generate Commit Message
- URL: http://arxiv.org/abs/2109.12296v1
- Date: Sat, 25 Sep 2021 07:08:28 GMT
- Title: Jointly Learning to Repair Code and Generate Commit Message
- Authors: Jiaqi Bai, Long Zhou, Ambrosio Blanco, Shujie Liu, Furu Wei, Ming
Zhou, Zhoujun Li
- Abstract summary: We construct a multilingual triple dataset including buggy code, fixed code, and commit messages for this novel task.
To deal with the error propagation problem of the cascaded method, the joint model is proposed that can both repair the code and generate the commit message.
Experimental results show that the enhanced cascaded model with teacher-student method and multitask-learning method achieves the best score on different metrics of automated code repair.
- Score: 78.4177637346384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel task of jointly repairing program codes and generating
commit messages. Code repair and commit message generation are two essential
and related tasks for software development. However, existing work usually
performs the two tasks independently. We construct a multilingual triple
dataset including buggy code, fixed code, and commit messages for this novel
task. We provide the cascaded models as baseline, which are enhanced with
different training approaches, including the teacher-student method, the
multi-task method, and the back-translation method. To deal with the error
propagation problem of the cascaded method, the joint model is proposed that
can both repair the code and generate the commit message in a unified
framework. Experimental results show that the enhanced cascaded model with
teacher-student method and multitask-learning method achieves the best score on
different metrics of automated code repair, and the joint model behaves better
than the cascaded model on commit message generation.
Related papers
- List-aware Reranking-Truncation Joint Model for Search and
Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently.
GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture.
Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z) - Improving the Learning of Code Review Successive Tasks with Cross-Task
Knowledge Distillation [1.0878040851638]
We introduce a novel deep-learning architecture, named DISCOREV, which employs cross-task knowledge distillation to address these tasks simultaneously.
We show that our approach generates better review comments, as measured by the BLEU score, as well as more accurate code refinement according to the CodeBLEU score.
arXiv Detail & Related papers (2024-02-03T07:02:22Z) - Delving into Commit-Issue Correlation to Enhance Commit Message
Generation Models [13.605167159285374]
Commit message generation is a challenging task in automated software engineering.
tool is a novel paradigm that can introduce the correlation between commits and issues into the training phase of models.
The results show that compared with the original models, the performance of tool-enhanced models is significantly improved.
arXiv Detail & Related papers (2023-07-31T20:35:00Z) - Improving Cross-task Generalization of Unified Table-to-text Models with
Compositional Task Configurations [63.04466647849211]
Methods typically encode task information with a simple dataset name as a prefix to the encoder.
We propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization.
We show this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations.
arXiv Detail & Related papers (2022-12-17T02:20:14Z) - Coder Reviewer Reranking for Code Generation [56.80381384717]
We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood.
Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only.
Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
arXiv Detail & Related papers (2022-11-29T18:56:33Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Conditioned Text Generation with Transfer for Closed-Domain Dialogue
Systems [65.48663492703557]
We show how to optimally train and control the generation of intent-specific sentences using a conditional variational autoencoder.
We introduce a new protocol called query transfer that allows to leverage a large unlabelled dataset.
arXiv Detail & Related papers (2020-11-03T14:06:10Z) - CoreGen: Contextualized Code Representation Learning for Commit Message
Generation [39.383390029545865]
We propose a novel Contextualized code representation learning strategy for commit message Generation (CoreGen)
Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with at least 28.18% improvement in terms of BLEU-4 score.
arXiv Detail & Related papers (2020-07-14T09:43:26Z) - Leveraging Code Generation to Improve Code Retrieval and Summarization
via Dual Learning [18.354352985591305]
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query.
Recent studies have combined these two tasks to improve their performance.
We propose a novel end-to-end model for the two tasks by introducing an additional code generation task.
arXiv Detail & Related papers (2020-02-24T12:26:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.