Related papers: Machine Learning for Actionable Warning Identification: A Comprehensive Survey

Machine Learning for Actionable Warning Identification: A Comprehensive Survey

URL: http://arxiv.org/abs/2312.00324v2
Date: Sun, 06 Oct 2024 08:27:32 GMT
Title: Machine Learning for Actionable Warning Identification: A Comprehensive Survey
Authors: Xiuting Ge, Chunrong Fang, Xuanye Li, Weisong Sun, Daoyuan Wu, Juan Zhai, Shangwei Lin, Zhihong Zhao, Yang Liu, Zhenyu Chen,
Abstract summary: Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. Recent advances in Machine Learning (ML) have been proposed to incorporate ML techniques into AWI. This paper systematically reviews the state-of-the-art ML-based AWI approaches.
Score: 19.18364564227752
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior performance. However, a comprehensive overview of these approaches is missing, which could hinder researchers/practitioners from understanding the current process and discovering potential for future improvement in the ML-based AWI community. In this paper, we systematically review the state-of-the-art ML-based AWI approaches. First, we employ a meticulous survey methodology and gather 51 primary studies from 2000/01/01 to 2023/09/01. Then, we outline the typical ML-based AWI workflow, including warning dataset preparation, preprocessing, AWI model construction, and evaluation stages. In such a workflow, we categorize ML-based AWI approaches based on the warning output format. Besides, we analyze the techniques used in each stage, along with their strengths, weaknesses, and distribution. Finally, we provide practical research directions for future ML-based AWI approaches, focusing on aspects like data improvement (e.g., enhancing the warning labeling strategy) and model exploration (e.g., exploring large language models for AWI).

Related papers

KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs [7.155121937602244]
We propose KnowML, a framework for knowledge-guided machine learning that integrates attack knowledge into ML-NIDS.<n>We evaluate KnowML on 28 realistic attack variants, of which 10 are newly collected for this study.
arXiv Detail & Related papers (2025-06-24T17:08:58Z)
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models [36.601209595620446]
This study investigates the machine unlearning techniques within the context of large language models (LLMs) LLMs unlearning offers a principled approach to removing the influence of undesirable data from LLMs. Despite growing research interest, there is no comprehensive survey that systematically organizes existing work and distills key insights.
arXiv Detail & Related papers (2025-02-22T12:46:14Z)
Machine Learning for Missing Value Imputation [0.0]
The main objective of this article is to conduct a comprehensive and rigorous review, as well as analysis, of the state-of-the-art machine learning applications in Missing Value Imputation. More than 100 articles published between 2014 and 2023 are critically reviewed, considering the methods and findings. The latest literature is examined to scrutinize the trends in MVI methods and their evaluation.
arXiv Detail & Related papers (2024-10-10T18:56:49Z)
Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
Membership inference attacks (MIAs) aim to determine whether a specific instance was part of a target model's training data. Applying MIAs to large language models (LLMs) presents unique challenges due to the massive scale of pre-training data and the ambiguous nature of membership. We introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.
arXiv Detail & Related papers (2024-10-10T03:31:16Z)
Task-Agnostic Machine-Learning-Assisted Inference [0.0]
We introduce a novel statistical framework named PSPS for task-agnostic ML-assisted inference. PSPS provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routines.
arXiv Detail & Related papers (2024-05-30T13:19:49Z)
Pre-trained Model-based Actionable Warning Identification: A Feasibility Study [21.231852710115863]
Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. This paper explores the feasibility of applying various Pre-Trained Models (PTMs) for AWI.
arXiv Detail & Related papers (2024-03-05T07:15:07Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
Practical Machine Learning Safety: A Survey and Primer [81.73857913779534]
Open-world deployment of Machine Learning algorithms in safety-critical applications such as autonomous vehicles needs to address a variety of ML vulnerabilities. New models and training techniques to reduce generalization error, achieve domain adaptation, and detect outlier examples and adversarial attacks. Our organization maps state-of-the-art ML techniques to safety strategies in order to enhance the dependability of the ML algorithm from different aspects.
arXiv Detail & Related papers (2021-06-09T05:56:42Z)
A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions. Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.