Related papers: A study on the impact of pre-trained model on Just-In-Time defect prediction

A study on the impact of pre-trained model on Just-In-Time defect prediction

URL: http://arxiv.org/abs/2309.02317v2
Date: Thu, 23 Nov 2023 06:19:01 GMT
Title: A study on the impact of pre-trained model on Just-In-Time defect prediction
Authors: Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, W.K.Chan and Bo Jiang
Abstract summary: We build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution.
Score: 10.205110163570502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings indicate that each model based on different backbones shows improvements, and when the backbone's pre-training model is similar, the training resources that need to be consumed are much more closer. We also observe that Commit code plays a significant role in defect detection, and different pre-trained models demonstrate better defect detection ability with a balanced dataset under few-shot scenarios. These results provide new insights for optimizing JIT defect prediction tasks using pre-trained models and highlight the factors that require more attention when constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better performance than DeepJIT and CC2Vec on the two datasets respectively under 2000 training samples. These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.

Related papers

Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. Existing approaches require re-training models on different data subsets, which is computationally intensive. This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z)
An Empirical Study on JIT Defect Prediction Based on BERT-style Model [5.098350174933033]
We study the impact of settings of the finetuning process on BERT-style pre-trained model for JIT defect prediction. Our findings reveal the crucial role of the first encoder layer in the BERT-style model. We combine these findings to find a cost-effective fine-tuning method based on LoRA.
arXiv Detail & Related papers (2024-03-17T09:41:55Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
Reusing Pretrained Models by Multi-linear Operators for Efficient Training [65.64075958382034]
Training large models from scratch usually costs a substantial amount of resources. Recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model. We propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model.
arXiv Detail & Related papers (2023-10-16T06:16:47Z)
Alleviating the Effect of Data Imbalance on Adversarial Training [26.36714114672729]
We study adversarial training on datasets that obey the long-tailed distribution. We propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT)
arXiv Detail & Related papers (2023-07-14T07:01:48Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets. We investigated model capabilities, training data, and model interpretation. Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z)
FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set where the sample weights are computed. We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z)
No One Representation to Rule Them All: Overlapping Features of Training Methods [12.58238785151714]
High-performing models tend to make similar predictions regardless of training methodology. Recent work has made very different training techniques, such as large-scale contrastive learning, yield competitively-high accuracy. We show these models specialize in generalization of the data, leading to higher ensemble performance.
arXiv Detail & Related papers (2021-10-20T21:29:49Z)
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity. We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.