Version-level Third-Party Library Detection in Android Applications via Class Structural Similarity
- URL: http://arxiv.org/abs/2504.13547v1
- Date: Fri, 18 Apr 2025 08:24:32 GMT
- Title: Version-level Third-Party Library Detection in Android Applications via Class Structural Similarity
- Authors: Bolin Zhou, Jingzheng Wu, Xiang Ling, Tianyue Luo, Jingkun Zhang,
- Abstract summary: We propose SAD, a TPL detection tool with high version-level detection performance.<n> SAD achieves F1 scores of 97.64% and 84.82% for library-level and version-level detection on obfuscated apps.
- Score: 3.8381968290928596
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Android applications (apps) integrate reusable and well-tested third-party libraries (TPLs) to enhance functionality and shorten development cycles. However, recent research reveals that TPLs have become the largest attack surface for Android apps, where the use of insecure TPLs can compromise both developer and user interests. To mitigate such threats, researchers have proposed various tools to detect TPLs used by apps, supporting further security analyses such as vulnerable TPLs identification. Although existing tools achieve notable library-level TPL detection performance in the presence of obfuscation, they struggle with version-level TPL detection due to a lack of sensitivity to differences between versions. This limitation results in a high version-level false positive rate, significantly increasing the manual workload for security analysts. To resolve this issue, we propose SAD, a TPL detection tool with high version-level detection performance. SAD generates a candidate app class list for each TPL class based on the feature of nodes in class dependency graphs (CDGs). It then identifies the unique corresponding app class for each TPL class by performing class matching based on the similarity of their class summaries. Finally, SAD identifies TPL versions by evaluating the structural similarity of the sub-graph formed by matched classes within the CDGs of the TPL and the app. Extensive evaluation on three datasets demonstrates the effectiveness of SAD and its components. SAD achieves F1 scores of 97.64% and 84.82% for library-level and version-level detection on obfuscated apps, respectively, surpassing existing state-of-the-art tools. The version-level false positives reported by the best tool is 1.61 times that of SAD. We further evaluate the degree to which TPLs identified by detection tools correspond to actual TPL classes.
Related papers
- BinCoFer: Three-Stage Purification for Effective C/C++ Binary Third-Party Library Detection [3.406168883492101]
Third-party libraries (TPL) are becoming increasingly popular to achieve efficient and concise software development.
unregulated use of TPL will introduce legal and security issues in software development.
BinCoFer is a tool designed for detecting TPLs reused in binary programs.
arXiv Detail & Related papers (2025-04-28T07:57:42Z) - Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation [11.037212298533069]
Large Language Models (LLMs) have opened up new opportunities to generate tests automatically.<n>This paper studies automatic test generation approaches based on three tools: EvoSuite for SBST, Kex for symbolic execution, and TestSpark for LLM-based test generation.<n>Our results show that while LLM-based test generation is promising, it falls behind traditional methods in terms of coverage.
arXiv Detail & Related papers (2025-01-17T13:48:32Z) - Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection [68.26282316080558]
Current open-world detectors can recognize a broader range of vocabularies, despite being trained on limited categories.
We introduce Prova, a prototype classifier for vast-vocabulary object detection.
arXiv Detail & Related papers (2024-12-23T18:57:43Z) - Enhancing Security in Third-Party Library Reuse -- Comprehensive Detection of 1-day Vulnerability through Code Patch Analysis [8.897599530972638]
Thirdparty libraries (TPLs) can introduce vulnerabilities (known as 1-day vulnerabilities) because of the low maintenance of TPLs.<n>VULTURE aims at identifying 1-day vulnerabilities that arise from the reuse of vulnerable TPLs.<n>VULTURE successfully identified 175 vulnerabilities from 178 reused TPLs.
arXiv Detail & Related papers (2024-11-29T12:02:28Z) - Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs [6.936401700600395]
Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements.<n>This is most likely due to the LLM lacking knowledge about some existing attacks and to the generated code being not evaluated in real usage scenarios.<n>We propose a novel approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline.
arXiv Detail & Related papers (2024-11-27T10:48:37Z) - Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.<n>Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.<n>We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z) - DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios [38.952481877244644]
We present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task.<n>Using popular large language models (LLMs), we generated data that better aligns with real-world applications.<n>We analyzed the potential impact of writing styles, model types, attack methods, the text lengths, and real-world human writing factors on different types of detectors.
arXiv Detail & Related papers (2024-10-31T09:01:25Z) - LLMDet: A Third Party Large Language Models Generated Text Detection
Tool [119.0952092533317]
Large language models (LLMs) are remarkably close to high-quality human-authored text.
Existing detection tools can only differentiate between machine-generated and human-authored text.
We propose LLMDet, a model-specific, secure, efficient, and extendable detection tool.
arXiv Detail & Related papers (2023-05-24T10:45:16Z) - LibAM: An Area Matching Framework for Detecting Third-party Libraries in
Binaries [28.877355564114904]
Third-party libraries (TPLs) are utilized by developers to expedite the software development process and incorporate external functionalities.
Insecure TPL reuse can lead to significant security risks.
We introduce LibAM, a novel Area Matching framework that connects isolated functions into function areas on Function Call Graph.
arXiv Detail & Related papers (2023-05-06T12:26:56Z) - Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE)
Our model significantly outperforms state-of-the-art alternatives.
Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z) - Incremental-DETR: Incremental Few-Shot Object Detection via
Self-Supervised Learning [60.64535309016623]
We propose the Incremental-DETR that does incremental few-shot object detection via fine-tuning and self-supervised learning on the DETR object detector.
To alleviate severe over-fitting with few novel class data, we first fine-tune the class-specific components of DETR with self-supervision.
We further introduce a incremental few-shot fine-tuning strategy with knowledge distillation on the class-specific components of DETR to encourage the network in detecting novel classes without catastrophic forgetting.
arXiv Detail & Related papers (2022-05-09T05:08:08Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.