Related papers: An Empirical Study on JIT Defect Prediction Based on BERT-style Model

An Empirical Study on JIT Defect Prediction Based on BERT-style Model

URL: http://arxiv.org/abs/2403.11158v1
Date: Sun, 17 Mar 2024 09:41:55 GMT
Title: An Empirical Study on JIT Defect Prediction Based on BERT-style Model
Authors: Yuxiang Guo, Xiaopeng Gao, Bo Jiang,
Abstract summary: We study the impact of settings of the finetuning process on BERT-style pre-trained model for JIT defect prediction. Our findings reveal the crucial role of the first encoder layer in the BERT-style model. We combine these findings to find a cost-effective fine-tuning method based on LoRA.
Score: 5.098350174933033
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Previous works on Just-In-Time (JIT) defect prediction tasks have primarily applied pre-trained models directly, neglecting the configurations of their fine-tuning process. In this study, we perform a systematic empirical study to understand the impact of the settings of the fine-tuning process on BERT-style pre-trained model for JIT defect prediction. Specifically, we explore the impact of different parameter freezing settings, parameter initialization settings, and optimizer strategies on the performance of BERT-style models for JIT defect prediction. Our findings reveal the crucial role of the first encoder layer in the BERT-style model and the project sensitivity to parameter initialization settings. Another notable finding is that the addition of a weight decay strategy in the Adam optimizer can slightly improve model performance. Additionally, we compare performance using different feature extractors (FCN, CNN, LSTM, transformer) and find that a simple network can achieve great performance. These results offer new insights for fine-tuning pre-trained models for JIT defect prediction. We combine these findings to find a cost-effective fine-tuning method based on LoRA, which achieve a comparable performance with only one-third memory consumption than original fine-tuning process.

Related papers

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs? [32.04523360747506]
We construct a dataset using 50 1B parameter LLM variants with systematically varied pre-training configurations. We introduce novel unsupervised and supervised proxy metrics derived from pre-training that successfully reduce the relative performance prediction error rate by over 50%.
arXiv Detail & Related papers (2025-04-16T21:19:09Z)
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models. We propose a novel model fine-tuning method to make full use of these ineffective parameters. Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z)
Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters. Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks. Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z)
Fairer and More Accurate Tabular Models Through NAS [14.147928131445852]
We propose using multi-objective Neural Architecture Search (NAS) and Hyperparameter Optimization (HPO) in the first application to the very challenging domain of tabular data. We show that models optimized solely for accuracy with NAS often fail to inherently address fairness concerns. We produce architectures that consistently dominate state-of-the-art bias mitigation methods either in fairness, accuracy or both.
arXiv Detail & Related papers (2023-10-18T17:56:24Z)
A study on the impact of pre-trained model on Just-In-Time defect prediction [10.205110163570502]
We build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution.
arXiv Detail & Related papers (2023-09-05T15:34:22Z)
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation. Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z)
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes [47.880781811936345]
We propose a novel framework for fine-tuning pretrained language models (LM) Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes.
arXiv Detail & Related papers (2022-11-24T14:38:08Z)
Design Amortization for Bayesian Optimal Experimental Design [70.13948372218849]
We build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the expected information gain (EIG) We present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs.
arXiv Detail & Related papers (2022-10-07T02:12:34Z)
Language model compression with weighted low-rank factorization [73.61874728240568]
We introduce Fisher information to weigh the importance of parameters affecting the model prediction. We find that our resulting task accuracy is much closer to the original model's performance. Our method can directly compress a task-specific model while achieving better performance than other compact model strategies.
arXiv Detail & Related papers (2022-06-30T21:57:07Z)
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness. We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z)
Adapting by Pruning: A Case Study on BERT [9.963251767416967]
We propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task. We formulate adapting-by-pruning as an optimisation problem with a differentiable loss and propose an efficient algorithm to prune the model. Results suggest that our method can prune up to 50% weights in BERT while yielding similar performance compared to the fine-tuned full model.
arXiv Detail & Related papers (2021-05-07T15:51:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.