Related papers: Agentic Discovery and Validation of Android App Vulnerabilities

Agentic Discovery and Validation of Android App Vulnerabilities

URL: http://arxiv.org/abs/2508.21579v1
Date: Fri, 29 Aug 2025 12:32:35 GMT
Title: Agentic Discovery and Validation of Android App Vulnerabilities
Authors: Ziyue Wang, Liyi Zhou,
Abstract summary: Existing Android vulnerability detection tools overwhelm teams with thousands of low-signal warnings.<n>Analysts spend days triaging these results, creating a bottleneck in the security pipeline.<n>We introduce A2, a system that mirrors how security experts analyze and validate Android vulnerabilities.
Score: 8.298163888812233
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Existing Android vulnerability detection tools overwhelm teams with thousands of low-signal warnings yet uncover few true positives. Analysts spend days triaging these results, creating a bottleneck in the security pipeline. Meanwhile, genuinely exploitable vulnerabilities often slip through, leaving opportunities open to malicious counterparts. We introduce A2, a system that mirrors how security experts analyze and validate Android vulnerabilities through two complementary phases: (i) Agentic Vulnerability Discovery, which reasons about application security by combining semantic understanding with traditional security tools; and (ii) Agentic Vulnerability Validation, which systematically validates vulnerabilities across Android's multi-modal attack surface-UI interactions, inter-component communication, file system operations, and cryptographic computations. On the Ghera benchmark (n=60), A2 achieves 78.3% coverage, surpassing state-of-the-art analyzers (e.g., APKHunt 30.0%). Rather than overwhelming analysts with thousands of warnings, A2 distills results into 82 speculative vulnerability findings, including 47 Ghera cases and 28 additional true positives. Crucially, A2 then generates working Proof-of-Concepts (PoCs) for 51 of these speculative findings, transforming them into validated vulnerability findings that provide direct, self-confirming evidence of exploitability. In real-world evaluation on 169 production APKs, A2 uncovers 104 true-positive zero-day vulnerabilities. Among these, 57 (54.8%) are self-validated with automatically generated PoCs, including a medium-severity vulnerability in a widely used application with over 10 million installs.

Related papers

Okara: Detection and Attribution of TLS Man-in-the-Middle Vulnerabilities in Android Apps with Foundation Models [3.9807330903947378]
Transport Layer Security (TLS) is fundamental to secure online communication.<n>Man-in-the-Middle (MitM) attacks remain a pervasive threat in Android apps.<n>We present Okara, a framework that automates the detection and attribution of MitM Vulnerabilities.
arXiv Detail & Related papers (2026-01-30T09:49:09Z)
ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack [52.17935054046577]
We present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks.<n>ReasAlign incorporates structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks.
arXiv Detail & Related papers (2026-01-15T08:23:38Z)
AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications [71.27518152526686]
Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation.<n>LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task.<n>This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types.
arXiv Detail & Related papers (2025-12-23T08:42:09Z)
Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models [0.0]
Large language models (LLMs) remain vulnerable to sophisticated prompt engineering attacks.<n>We introduce Jailbreak Mimicry, a systematic methodology for training compact attacker models to automatically generate narrative-based jailbreak prompts.<n>Our approach transforms adversarial prompt discovery from manual craftsmanship into a reproducible scientific process.
arXiv Detail & Related papers (2025-10-24T23:53:16Z)
VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z)
From Attack Descriptions to Vulnerabilities: A Sentence Transformer-Based Approach [0.39134914399411086]
This paper evaluates 14 state-of-the-art sentence transformers for automatically identifying vulnerabilities from textual descriptions of attacks.<n>On average, 56% of the vulnerabilities identified by the MMPNet model are also represented within the CVE repository in conjunction with an attack.<n>A manual inspection of the results revealed the existence of 275 predicted links that were not documented in the MITRE repositories.
arXiv Detail & Related papers (2025-09-02T08:27:36Z)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z)
PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages [13.877936187495555]
We present PoCGen, a novel approach to autonomously generate and validate PoC exploits for vulnerabilities in npm packages.<n>PoCGen successfully generates exploits for 77% of the vulnerabilities in the SecBench$.$js dataset.
arXiv Detail & Related papers (2025-06-05T12:37:33Z)
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems [62.17474934536671]
We introduce the first framework to capture offensive and defensive cyber-capabilities in evolving real-world systems.<n>To capture the vulnerability lifecycle, we define three task types: Detect (detecting a new vulnerability), Exploit (exploiting a specific vulnerability), and Patch (patching a specific vulnerability)<n>We evaluate 8 agents: Claude Code, OpenAI Codex CLI with o3-high and o4-mini, and custom agents with o3-high, GPT-4.1, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet Thinking, and DeepSeek-R1.
arXiv Detail & Related papers (2025-05-21T07:44:52Z)
T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks [67.91652526657599]
We formalize the T2V jailbreak attack as a discrete optimization problem and propose a joint objective-based optimization framework, called T2V-OptJail.<n>We conduct large-scale experiments on several T2V models, covering both open-source models and real commercial closed-source models.<n>The proposed method improves 11.4% and 10.0% over the existing state-of-the-art method in terms of attack success rate.
arXiv Detail & Related papers (2025-05-10T16:04:52Z)
Does the Vulnerability Threaten Our Projects? Automated Vulnerable API Detection for Third-Party Libraries [11.012017507408078]
We propose VAScanner, which can effectively identify vulnerable root methods causing vulnerabilities in TPLs. VAScanner eliminates 5.78% false positives and 2.16% false negatives owing to the proposed sifting and augmentation mechanisms. In a large-scale analysis of 3,147 projects using vulnerable TPLs, we find only 21.51% of projects were threatened by vulnerable APIs.
arXiv Detail & Related papers (2024-09-04T14:31:16Z)
Static Detection of Filesystem Vulnerabilities in Android Systems [18.472695251551176]
We present PathSentinel, which overcomes the limitations of previous techniques by combining static program analysis and access control policy analysis. By unifying program and access control policy analysis, PathSentinel identifies attack surfaces accurately and prunes many impractical attacks. To streamline vulnerability validation, PathSentinel leverages large language models (LLMs) to generate targeted exploit code.
arXiv Detail & Related papers (2024-07-15T23:10:52Z)
Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We? [14.974832502863526]
In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. In this paper, we propose an up-to-date and fine-grained taxonomy that includes 45 unique vulnerability types for smart contracts.
arXiv Detail & Related papers (2024-04-28T13:40:18Z)
Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception. We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception. We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z)
Certifiers Make Neural Networks Vulnerable to Availability Attacks [70.69104148250614]
We show for the first time that fallback strategies can be deliberately triggered by an adversary. In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback. We design two novel availability attacks, which show the practical relevance of these threats.
arXiv Detail & Related papers (2021-08-25T15:49:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.