Automatic Pull Request Description Generation Using LLMs: A T5 Model Approach
- URL: http://arxiv.org/abs/2408.00921v1
- Date: Thu, 1 Aug 2024 21:22:16 GMT
- Title: Automatic Pull Request Description Generation Using LLMs: A T5 Model Approach
- Authors: Md Nazmus Sakib, Md Athikul Islam, Md Mashrur Arifin,
- Abstract summary: We propose an automated method for generating PR descriptions based on commit messages and source code comments.
We fine-tuned a pre-trained T5 model using a dataset containing 33,466 PRs.
Our findings reveal that the T5 model significantly outperforms LexRank.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developers create pull request (PR) descriptions to provide an overview of their changes and explain the motivations behind them. These descriptions help reviewers and fellow developers quickly understand the updates. Despite their importance, some developers omit these descriptions. To tackle this problem, we propose an automated method for generating PR descriptions based on commit messages and source code comments. This method frames the task as a text summarization problem, for which we utilized the T5 text-to-text transfer model. We fine-tuned a pre-trained T5 model using a dataset containing 33,466 PRs. The model's effectiveness was assessed using ROUGE metrics, which are recognized for their strong alignment with human evaluations. Our findings reveal that the T5 model significantly outperforms LexRank, which served as our baseline for comparison.
Related papers
- Evaluating the Impact of Data Cleaning on the Quality of Generated Pull Request Descriptions [2.2134505920972547]
Pull Requests (PRs) are central to collaborative coding.<n>Many PRs are incomplete, uninformative, or have out-of-context content.<n>This study examines the prevalence of "noisy" PRs and evaluates their impact on description generation models.
arXiv Detail & Related papers (2025-05-02T08:58:42Z) - Take It Easy: Label-Adaptive Self-Rationalization for Fact Verification and Explanation Generation [15.94564349084642]
Self-rationalization method is typically used in natural language inference tasks.
We fine-tune a model to learn veracity prediction with annotated labels.
We generate synthetic explanations from three large language models.
arXiv Detail & Related papers (2024-10-05T02:19:49Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Compositional Generalization for Data-to-Text Generation [86.79706513098104]
We propose a novel model that addresses compositional generalization by clustering predicates into groups.
Our model generates text in a sentence-by-sentence manner, relying on one cluster of predicates at a time.
It significantly outperforms T5baselines across all evaluation metrics.
arXiv Detail & Related papers (2023-12-05T13:23:15Z) - Revisiting Relation Extraction in the era of Large Language Models [24.33660998599006]
Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text.
Recent work has instead treated the problem as a emphsequence-to-sequence task, linearizing relations between entities as target strings to be generated conditioned on the input.
Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision.
arXiv Detail & Related papers (2023-05-08T19:19:07Z) - ExaRanker: Explanation-Augmented Neural Ranker [67.4894325619275]
In this work, we show that neural rankers also benefit from explanations.
We use LLMs such as GPT-3.5 to augment retrieval datasets with explanations.
Our model, dubbed ExaRanker, finetuned on a few thousand examples with synthetic explanations performs on par with models finetuned on 3x more examples without explanations.
arXiv Detail & Related papers (2023-01-25T11:03:04Z) - Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task.
The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.
Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z) - Evaluation of Transfer Learning for Polish with a Text-to-Text Model [54.81823151748415]
We introduce a new benchmark for assessing the quality of text-to-text models for Polish.
The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to-text, en-pl translation, summarization, and question answering.
We present plT5 - a general-purpose text-to-text model for Polish that can be fine-tuned on various Natural Language Processing (NLP) tasks with a single training objective.
arXiv Detail & Related papers (2022-05-18T09:17:14Z) - Turning Tables: Generating Examples from Semi-structured Tables for
Endowing Language Models with Reasoning Skills [32.55545292360155]
We propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs.
We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills.
We show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model.
arXiv Detail & Related papers (2021-07-15T11:37:14Z) - Civil Rephrases Of Toxic Texts With Self-Supervised Transformers [4.615338063719135]
This work focuses on models that can help suggest rephrasings of toxic comments in a more civil manner.
Inspired by recent progress in unpaired sequence-to-sequence tasks, a self-supervised learning model is introduced, called CAE-T5.
arXiv Detail & Related papers (2021-02-01T15:27:52Z) - Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text
Summarization [1.0742675209112622]
This paper introduces a novel dataset named pn-summary for Persian abstractive text summarization.
The models employed in this paper are mT5 and an encoder-decoder version of the ParsBERT model.
arXiv Detail & Related papers (2020-12-21T09:35:52Z) - mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks.
We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.