Revisiting Third-Party Library Detection: A Ground Truth Dataset and Its Implications Across Security Tasks
- URL: http://arxiv.org/abs/2509.04091v2
- Date: Fri, 05 Sep 2025 10:34:25 GMT
- Title: Revisiting Third-Party Library Detection: A Ground Truth Dataset and Its Implications Across Security Tasks
- Authors: Jintao Gu, Haolang Lu, Guoshun Nan, Yihan Lin, Kun Wang, Yuchun Guo, Yigui Cao, Yang Liu,
- Abstract summary: We present the first large-scale empirical study of ten state-of-the-art TPL detection techniques across over 6,000 apps.<n>Our evaluation exposes tool fragility to R8-era transformations, weak version discrimination, inaccurate correspondence of candidate libraries, difficulty in generalizing similarity thresholds, and prohibitive runtime/memory overheads at scale.
- Score: 12.328841506102963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate detection of third-party libraries (TPLs) is fundamental to Android security, supporting vulnerability tracking, malware detection, and supply chain auditing. Despite many proposed tools, their real-world effectiveness remains unclear. We present the first large-scale empirical study of ten state-of-the-art TPL detection techniques across over 6,000 apps, enabled by a new ground truth dataset with precise version-level annotations for both remote and local dependencies. Our evaluation exposes tool fragility to R8-era transformations, weak version discrimination, inaccurate correspondence of candidate libraries, difficulty in generalizing similarity thresholds, and prohibitive runtime/memory overheads at scale. Beyond tool assessment, we further analyze how TPLs shape downstream tasks, including vulnerability analysis, malware detection, secret leakage assessment, and LLM-based evaluation. From this perspective, our study provides concrete insights into how TPL characteristics affect these tasks and informs future improvements in security analysis.
Related papers
- PaperAudit-Bench: Benchmarking Error Detection in Research Papers for Critical Automated Peer Review [54.141490756509306]
We introduce PaperAudit-Bench, which consists of two components: PaperAudit-Dataset, an error dataset, and PaperAudit-Review, an automated review framework.<n>Experiments on PaperAudit-Bench reveal large variability in error detectability across models and detection depths.<n>We show that the dataset supports training lightweight LLM detectors via SFT and RL, enabling effective error detection at reduced computational cost.
arXiv Detail & Related papers (2026-01-07T04:26:12Z) - DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [70.77570343385928]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z) - Context-Enhanced Vulnerability Detection Based on Large Language Model [17.922081397554155]
We propose a context-enhanced vulnerability detection approach that combines program analysis with large language models.<n>Specifically, we use program analysis to extract contextual information at various levels of abstraction, thereby filtering out irrelevant noise.<n>Our goal is to strike a balance between providing sufficient detail to accurately capture vulnerabilities and minimizing unnecessary complexity.
arXiv Detail & Related papers (2025-04-23T16:54:16Z) - CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection [2.5228276786940182]
This paper introduces CASTLE, a benchmarking framework for evaluating the vulnerability detection capabilities of different methods.<n>We assess 13 static analysis tools, 10 LLMs, and 2 formal verification tools using a hand-crafted dataset of 250 micro-benchmark programs covering 25 common CWEs.
arXiv Detail & Related papers (2025-03-12T14:30:05Z) - LLM-Safety Evaluations Lack Robustness [58.334290876531036]
We argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of noise.<n>We propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers.
arXiv Detail & Related papers (2025-03-04T12:55:07Z) - LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights [12.424610893030353]
Large Language Models (LLMs) are emerging as transformative tools for software vulnerability detection.<n>This paper provides a detailed survey of LLMs in vulnerability detection.<n>We address challenges such as cross-language vulnerability detection, multimodal data integration, and repository-level analysis.
arXiv Detail & Related papers (2025-02-10T21:33:38Z) - SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI [58.29510889419971]
Existing benchmarks for evaluating the security risks and capabilities of code-generating large language models (LLMs) face several key limitations.<n>We introduce a general and scalable benchmark construction framework that begins with manually validated, high-quality seed examples and expands them via targeted mutations.<n>Applying this framework to Python, C/C++, and Java, we build SeCodePLT, a dataset of more than 5.9k samples spanning 44 CWE-based risk categories and three security capabilities.
arXiv Detail & Related papers (2024-10-14T21:17:22Z) - Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection [9.652886240532741]
This paper thoroughly analyses large language models' capabilities in detecting vulnerabilities within source code.
We evaluate the performance of six open-source models that are specifically trained for vulnerability detection against six general-purpose LLMs.
arXiv Detail & Related papers (2024-08-29T10:00:57Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities [12.82645410161464]
We evaluate the effectiveness of 16 pre-trained Large Language Models on 5,000 code samples from five diverse security datasets.
Overall, LLMs show modest effectiveness in detecting vulnerabilities, obtaining an average accuracy of 62.8% and F1 score of 0.71 across datasets.
We find that advanced prompting strategies that involve step-by-step analysis significantly improve performance of LLMs on real-world datasets in terms of F1 score (by upto 0.18 on average)
arXiv Detail & Related papers (2023-11-16T13:17:20Z) - DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark.
We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions.
DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.