Pre-trained Model-based Actionable Warning Identification: A Feasibility
Study
- URL: http://arxiv.org/abs/2403.02716v1
- Date: Tue, 5 Mar 2024 07:15:07 GMT
- Title: Pre-trained Model-based Actionable Warning Identification: A Feasibility
Study
- Authors: Xiuting Ge and Chunrong Fang and Quanjun Zhang and Daoyuan Wu and
Bowen Yu and Qirui Zheng and An Guo and Shangwei Lin and Zhihong Zhao and
Yang Liu and Zhenyu Chen
- Abstract summary: Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers.
Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common.
This paper explores the feasibility of applying various Pre-Trained Models (PTMs) for AWI.
- Score: 21.231852710115863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Actionable Warning Identification (AWI) plays a pivotal role in improving the
usability of static code analyzers. Currently, Machine Learning (ML)-based AWI
approaches, which mainly learn an AWI classifier from labeled warnings, are
notably common. However, these approaches still face the problem of restricted
performance due to the direct reliance on a limited number of labeled warnings
to develop a classifier. Very recently, Pre-Trained Models (PTMs), which have
been trained through billions of text/code tokens and demonstrated substantial
success applications on various code-related tasks, could potentially
circumvent the above problem. Nevertheless, the performance of PTMs on AWI has
not been systematically investigated, leaving a gap in understanding their pros
and cons. In this paper, we are the first to explore the feasibility of
applying various PTMs for AWI. By conducting the extensive evaluation on 10K+
SpotBugs warnings from 10 large-scale and open-source projects, we observe that
all studied PTMs are consistently 9.85%~21.12% better than the state-of-the-art
ML-based AWI approaches. Besides, we investigate the impact of three primary
aspects (i.e., data preprocessing, model training, and model prediction) in the
typical PTM-based AWI workflow. Further, we identify the reasons for current
PTMs' underperformance on AWI. Based on our findings, we provide several
practical guidelines to enhance PTM-based AWI in future work.
Related papers
- PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models [39.56594737760323]
PTSBench is the first comprehensive post-training sparsity benchmark towards algorithms and models.
We benchmark 10+ PTS general-pluggable fine-grained techniques on 3 typical tasks using over 40 off-the-shelf model architectures.
Our PTSBench can provide (1) new observations for a better understanding of the PTS algorithms, (2) in-depth and comprehensive evaluations for the sparsification ability of models, and (3) a well-structured and easy-integrate open-source framework.
arXiv Detail & Related papers (2024-12-10T07:49:07Z) - Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones.
This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z) - Machine Learning for Actionable Warning Identification: A Comprehensive Survey [19.18364564227752]
Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers.
Recent advances in Machine Learning (ML) have been proposed to incorporate ML techniques into AWI.
This paper systematically reviews the state-of-the-art ML-based AWI approaches.
arXiv Detail & Related papers (2023-12-01T03:38:21Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - A Study on Differentiable Logic and LLMs for EPIC-KITCHENS-100
Unsupervised Domain Adaptation Challenge for Action Recognition 2023 [23.323548254515494]
We present our findings from a study conducted on the EPIC-KITCHENS-100 Unsupervised Domain Adaptation task for Action Recognition.
Our research focuses on the innovative application of a differentiable logic loss in the training to leverage the co-occurrence relations between verb and noun.
Our final submission (entitled NS-LLM') achieved the first place in terms of top-1 action recognition accuracy.
arXiv Detail & Related papers (2023-07-13T05:54:05Z) - Cross-Modal Fine-Tuning: Align then Refine [83.37294254884446]
ORCA is a cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities.
We show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities.
arXiv Detail & Related papers (2023-02-11T16:32:28Z) - A Survey on Programmatic Weak Supervision [74.13976343129966]
We give brief introduction of the PWS learning paradigm and review representative approaches for each PWS's learning workflow.
We identify several critical challenges that remain underexplored in the area to hopefully inspire future directions in the field.
arXiv Detail & Related papers (2022-02-11T04:05:38Z) - Pre-Trained Models: Past, Present and Future [126.21572378910746]
Large-scale pre-trained models (PTMs) have recently achieved great success and become a milestone in the field of artificial intelligence (AI)
By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks.
It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch.
arXiv Detail & Related papers (2021-06-14T02:40:32Z) - Foreseeing the Benefits of Incidental Supervision [83.08441990812636]
This paper studies whether we can, in a single framework, quantify the benefits of various types of incidental signals for a given target task without going through experiments.
We propose a unified PAC-Bayesian motivated informativeness measure, PABI, that characterizes the uncertainty reduction provided by incidental supervision signals.
arXiv Detail & Related papers (2020-06-09T20:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.