Okara: Detection and Attribution of TLS Man-in-the-Middle Vulnerabilities in Android Apps with Foundation Models
- URL: http://arxiv.org/abs/2601.22770v1
- Date: Fri, 30 Jan 2026 09:49:09 GMT
- Title: Okara: Detection and Attribution of TLS Man-in-the-Middle Vulnerabilities in Android Apps with Foundation Models
- Authors: Haoyun Yang, Ronghong Huang, Yong Fang, Beizeng Zhang, Junpu Guo, Zhanyu Wu, Xianghang Mi,
- Abstract summary: Transport Layer Security (TLS) is fundamental to secure online communication.<n>Man-in-the-Middle (MitM) attacks remain a pervasive threat in Android apps.<n>We present Okara, a framework that automates the detection and attribution of MitM Vulnerabilities.
- Score: 3.9807330903947378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transport Layer Security (TLS) is fundamental to secure online communication, yet vulnerabilities in certificate validation that enable Man-in-the-Middle (MitM) attacks remain a pervasive threat in Android apps. Existing detection tools are hampered by low-coverage UI interaction, costly instrumentation, and a lack of scalable root-cause analysis. We present Okara, a framework that leverages foundation models to automate the detection and deep attribution of TLS MitM Vulnerabilities (TMVs). Okara's detection component, TMV-Hunter, employs foundation model-driven GUI agents to achieve high-coverage app interaction, enabling efficient vulnerability discovery at scale. Deploying TMV-Hunter on 37,349 apps from Google Play and a third-party store revealed 8,374 (22.42%) vulnerable apps. Our measurement shows these vulnerabilities are widespread across all popularity levels, affect critical functionalities like authentication and code delivery, and are highly persistent with a median vulnerable lifespan of over 1,300 days. Okara's attribution component, TMV-ORCA, combines dynamic instrumentation with a novel LLM-based classifier to locate and categorize vulnerable code according to a comprehensive new taxonomy. This analysis attributes 41% of vulnerabilities to third-party libraries and identifies recurring insecure patterns, such as empty trust managers and flawed hostname verification. We have initiated a large-scale responsible disclosure effort and will release our tools and datasets to support further research and mitigation.
Related papers
- Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs [65.6660735371212]
We present textbftextscJustAsk, a framework that autonomously discovers effective extraction strategies through interaction alone.<n>It formulates extraction as an online exploration problem, using Upper Confidence Bound--based strategy selection and a hierarchical skill space spanning atomic probes and high-level orchestration.<n>Our results expose system prompts as a critical yet largely unprotected attack surface in modern agent systems.
arXiv Detail & Related papers (2026-01-29T03:53:25Z) - VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection [45.69684471143409]
VulnLLM-R is theemphfirst specialized reasoning LLM for vulnerability detection.<n>We train a reasoning model with seven billion parameters.<n>We show that VulnLLM-R has superior effectiveness and efficiency than SOTA static analysis tools.
arXiv Detail & Related papers (2025-12-08T13:06:23Z) - The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z) - OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows [77.95511352806261]
Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms.<n>We propose OS-Sentinel, a novel hybrid safety detection framework that combines a Formal Verifier for detecting explicit system-level violations with a Contextual Judge for assessing contextual risks and agent actions.
arXiv Detail & Related papers (2025-10-28T13:22:39Z) - ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection [43.41293570032631]
ParaVul is a retrieval-augmented framework to improve the reliability and accuracy of smart contract vulnerability detection.<n>We develop Sparse Low-Rank Adaptation (SLoRA) for LLM fine-tuning.<n>We construct a vulnerability contract dataset and develop a hybrid Retrieval-Augmented Generation (RAG) system.
arXiv Detail & Related papers (2025-10-20T03:23:41Z) - LLM Agents for Automated Web Vulnerability Reproduction: Are We There Yet? [9.817896112083647]
Large language model (LLM) agents have demonstrated remarkable capabilities in software engineering and cybersecurity tasks.<n>Recent advances suggest promising potential, but challenges remain in applying LLM agents to real-world web vulnerability reproduction scenarios.<n>This paper presents the first comprehensive evaluation of state-of-the-art LLM agents for automated web vulnerability reproduction.
arXiv Detail & Related papers (2025-10-16T14:04:46Z) - VulnRepairEval: An Exploit-Based Evaluation Framework for Assessing Large Language Model Vulnerability Repair Capabilities [41.85494398578654]
VulnRepairEval is an evaluation framework anchored in functional Proof-of-Concept exploits.<n>Our framework delivers a comprehensive, containerized evaluation pipeline that enables reproducible differential assessment.
arXiv Detail & Related papers (2025-09-03T14:06:10Z) - Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents [4.303444472156151]
Large Language Model (LLM)-enabled agents are rapidly emerging across a wide range of applications.<n>This work presents the first study of time-of-check to time-of-use (TOCTOU) vulnerabilities in LLM-enabled agents.<n>We introduce TOCTOU-Bench, a benchmark with 66 realistic user tasks designed to evaluate this class of vulnerabilities.
arXiv Detail & Related papers (2025-08-23T22:41:49Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - ATAG: AI-Agent Application Threat Assessment with Attack Graphs [23.757154032523093]
This paper introduces AI-agent application Threat assessment with Attack Graphs (ATAG)<n>ATAG is a novel framework designed to systematically analyze the security risks associated with AI-agent applications.<n>It facilitates proactive identification and mitigation of AI-agent threats in multi-agent applications.
arXiv Detail & Related papers (2025-06-03T13:25:40Z) - CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale [45.97598662617568]
We introduce CyberGym, a large-scale benchmark featuring 1,507 real-world vulnerabilities across 188 software projects.<n>We show that CyberGym leads to the discovery of 35 zero-day vulnerabilities and 17 historically incomplete patches.<n>These results underscore that CyberGym is not only a robust benchmark for measuring AI's progress in cybersecurity but also a platform for creating direct, real-world security impact.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics [5.384257830522198]
Large Language Models (LLMs) in critical applications have introduced severe reliability and security risks.<n>These vulnerabilities have been weaponized by malicious actors, leading to unauthorized access, widespread misinformation, and compromised system integrity.<n>We introduce a novel approach to detecting abnormal behaviors in LLMs via hidden state forensics.
arXiv Detail & Related papers (2025-04-01T05:58:14Z) - CryptoFormalEval: Integrating LLMs and Formal Verification for Automated Cryptographic Protocol Vulnerability Detection [41.94295877935867]
We introduce a benchmark to assess the ability of Large Language Models to autonomously identify vulnerabilities in new cryptographic protocols.
We created a dataset of novel, flawed, communication protocols and designed a method to automatically verify the vulnerabilities found by the AI agents.
arXiv Detail & Related papers (2024-11-20T14:16:55Z) - Static Detection of Filesystem Vulnerabilities in Android Systems [18.472695251551176]
We present PathSentinel, which overcomes the limitations of previous techniques by combining static program analysis and access control policy analysis.
By unifying program and access control policy analysis, PathSentinel identifies attack surfaces accurately and prunes many impractical attacks.
To streamline vulnerability validation, PathSentinel leverages large language models (LLMs) to generate targeted exploit code.
arXiv Detail & Related papers (2024-07-15T23:10:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.