Related papers: Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention Inference

Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention Inference

URL: http://arxiv.org/abs/2311.04448v4
Date: Thu, 12 Dec 2024 12:57:59 GMT
Title: Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention Inference
Authors: Chong Wang, Jianan Liu, Xin Peng, Yang Liu, Yiling Lou,
Abstract summary: Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources.<n>We propose InferROI, a novel approach that directly infers resource-oriented intentions (acquisition, release, and reachability validation) in code.<n>We evaluate the effectiveness of InferROI in both resource-oriented intention inference and resource leak detection.
Score: 14.783216988363804
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources, suffering from both (1) false negatives caused by the incompleteness of predefined resource acquisition/release APIs and (2) false positives caused by the incompleteness of resource reachability validation identification. To overcome these challenges, we propose InferROI, a novel approach that leverages the exceptional code comprehension capability of large language models (LLMs) to directly infer resource-oriented intentions (acquisition, release, and reachability validation) in code. InferROI first prompts the LLM to infer involved intentions for a given code snippet, and then incorporates a two-stage static analysis approach to check control-flow paths for resource leak detection based on the inferred intentions. We evaluate the effectiveness of InferROI in both resource-oriented intention inference and resource leak detection. Experimental results on the DroidLeaks and JLeaks datasets demonstrate InferROI achieves promising bug detection rate (59.3% and 62.5%) and false alarm rate (18.6% and 19.5%). Compared to three industrial static detectors, InferROI detects 14~45 and 149~485 more bugs in DroidLeaks and JLeaks, respectively. When applied to real-world open-source projects, InferROI identifies 29 unknown resource leak bugs (verified by authors), with 7 of them being confirmed by developers. In addition, the results of an ablation study underscores the importance of combining LLM-based inference with static analysis.

Related papers

Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask [30.819697001992154]
Large Language Models are a promising tool for automated vulnerability detection. Despite widespread adoption, a critical question remains: Are LLMs truly effective at detecting real-world vulnerabilities? This paper challenges three widely held community beliefs: that LLMs are (i) unreliable, (ii) insensitive to code patches, and (iii) performance-plateaued across model scales.
arXiv Detail & Related papers (2025-04-18T05:32:47Z)
LLPut: Investigating Large Language Models for Bug Report-Based Input Generation [0.0]
Failure-inducing inputs play a crucial role in diagnosing and analyzing software bugs. Prior research has leveraged various Natural Language Processing (NLP) techniques for automated input extraction. With the advent of Large Language Models (LLMs), an important research question arises: how effectively can generative LLMs extract failure-inducing inputs from bug reports?
arXiv Detail & Related papers (2025-03-26T14:25:01Z)
Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output [49.893971654861424]
We present a light-weight approach for detecting nonfactual outputs from retrieval-augmented generation (RAG) We compute a factuality score that can be thresholded to yield a binary decision. Our experiments show high area under the ROC curve (AUC) across a wide range of relevant open source datasets.
arXiv Detail & Related papers (2024-11-01T20:44:59Z)
On the Effectiveness of LLMs for Manual Test Verifications [1.920300814128832]
This study aims to explore the use of Large Language Models (LLMs) to produce verifications for manual tests. Open-source models Mistral-7B and Phi-3-mini-4k demonstrated effectiveness and consistency comparable to closed-source models. There were also concerns about AI hallucinations, where verifications significantly deviated from expectations.
arXiv Detail & Related papers (2024-09-19T02:03:04Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
Collaborative Knowledge Infusion for Low-resource Stance Detection [83.88515573352795]
Target-related knowledge is often needed to assist stance detection models. We propose a collaborative knowledge infusion approach for low-resource stance detection tasks.
arXiv Detail & Related papers (2024-03-28T08:32:14Z)
A Little Leak Will Sink a Great Ship: Survey of Transparency for Large Language Models from Start to Finish [47.3916421056009]
Large Language Models (LLMs) are trained on massive web-crawled corpora. LLMs produce leaked information in most cases despite less such data in their training set. Self-detection method showed superior performance compared to existing detection methods.
arXiv Detail & Related papers (2024-03-24T13:21:58Z)
Identification of 4FGL uncertain sources at Higher Resolutions with Inverse Discrete Wavelet Transform [0.562479170374811]
In the forthcoming era of big astronomical data, it is a burden to find out target sources from ground-based and space-based telescopes. In this work, we focused on the task of finding AGN candidates and identifying BL Lac/FSRQ candidates from the 4FGL DR3 uncertain sources.
arXiv Detail & Related papers (2024-01-05T01:02:34Z)
"Knowing When You Don't Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation [90.09260023184932]
Retrieval-Augmented Generation (RAG) grounds Large Language Model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations. NoMIRACL is a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages. We measure relevance assessment using: (i) hallucination rate, measuring model tendency to hallucinate, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.
arXiv Detail & Related papers (2023-12-18T17:18:04Z)
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks. We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion. We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
Inference of Resource Management Specifications [2.8975089867684436]
A resource leak occurs when a program fails to free some finite resource after it is no longer needed. Recent work proposed an approach to prevent resource leaks based on checking resource management specifications. This paper presents a novel technique to automatically infer a resource management specification for a program.
arXiv Detail & Related papers (2023-06-21T00:42:42Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Improving Candidate Generation for Low-resource Cross-lingual Entity Linking [81.41804263432684]
Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. In this paper, we propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios.
arXiv Detail & Related papers (2020-03-03T05:32:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.