OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
- URL: http://arxiv.org/abs/2510.24411v1
- Date: Tue, 28 Oct 2025 13:22:39 GMT
- Title: OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
- Authors: Qiushi Sun, Mukai Li, Zhoumianze Liu, Zhihui Xie, Fangzhi Xu, Zhangyue Yin, Kanzhi Cheng, Zehao Li, Zichen Ding, Qi Liu, Zhiyong Wu, Zhuosheng Zhang, Ben Kao, Lingpeng Kong,
- Abstract summary: Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms.<n>We propose OS-Sentinel, a novel hybrid safety detection framework that combines a Formal Verifier for detecting explicit system-level violations with a Contextual Judge for assessing contextual risks and agent actions.
- Score: 77.95511352806261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms. While these agents hold great promise for advancing digital automation, their potential for unsafe operations, such as system compromise and privacy leakage, is raising significant concerns. Detecting these safety concerns across the vast and complex operational space of mobile environments presents a formidable challenge that remains critically underexplored. To establish a foundation for mobile agent safety research, we introduce MobileRisk-Live, a dynamic sandbox environment accompanied by a safety detection benchmark comprising realistic trajectories with fine-grained annotations. Built upon this, we propose OS-Sentinel, a novel hybrid safety detection framework that synergistically combines a Formal Verifier for detecting explicit system-level violations with a VLM-based Contextual Judge for assessing contextual risks and agent actions. Experiments show that OS-Sentinel achieves 10%-30% improvements over existing approaches across multiple metrics. Further analysis provides critical insights that foster the development of safer and more reliable autonomous mobile agents.
Related papers
- Enhancing Reliability in LLM-Integrated Robotic Systems: A Unified Approach to Security and Safety [6.3301898351857515]
Integrating large language models into robotic systems has revolutionised embodied artificial intelligence.<n>We propose a unified framework that mitigates prompt injection attacks while enforcing operational safety.<n>Our approach combines prompt assembling, state management, and safety validation, evaluated using both performance and security metrics.
arXiv Detail & Related papers (2025-09-02T10:14:28Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z) - AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions [64.85086226439954]
We present SAFE, a benchmark for assessing the safety of embodied VLM agents on hazardous instructions.<n> SAFE comprises three components: SAFE-THOR, SAFE-VERSE, and SAFE-DIAGNOSE.<n>We uncover systematic failures in translating hazard recognition into safe planning and execution.
arXiv Detail & Related papers (2025-06-17T16:37:35Z) - SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z) - Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation [55.02966123945644]
We propose a hierarchical control framework leveraging neural network verification techniques to design control barrier functions (CBFs) and policy correction mechanisms.<n>Our approach relies on probabilistic enumeration to identify unsafe regions of operation, which are then used to construct a safe CBF-based control layer.<n>These experiments demonstrate the ability of the proposed solution to correct unsafe actions while preserving efficient navigation behavior.
arXiv Detail & Related papers (2025-04-30T13:47:25Z) - Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics [5.384257830522198]
Large Language Models (LLMs) in critical applications have introduced severe reliability and security risks.<n>These vulnerabilities have been weaponized by malicious actors, leading to unauthorized access, widespread misinformation, and compromised system integrity.<n>We introduce a novel approach to detecting abnormal behaviors in LLMs via hidden state forensics.
arXiv Detail & Related papers (2025-04-01T05:58:14Z) - VMGuard: Reputation-Based Incentive Mechanism for Poisoning Attack Detection in Vehicular Metaverse [52.57251742991769]
vehicular Metaverse guard (VMGuard) protects vehicular Metaverse systems from data poisoning attacks.<n>VMGuard implements a reputation-based incentive mechanism to assess the trustworthiness of participating SIoT devices.<n>Our system ensures that reliable SIoT devices, previously missclassified, are not barred from participating in future rounds of the market.
arXiv Detail & Related papers (2024-12-05T17:08:20Z) - MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control [20.796190000442053]
We introduce MobileSafetyBench, a benchmark designed to evaluate the safety of mobile device-control agents.<n>We develop a diverse set of tasks involving interactions with various mobile applications, including messaging and banking applications.<n>Our experiments demonstrate that baseline agents, based on state-of-the-art LLMs, often fail to effectively prevent harm while performing the tasks.
arXiv Detail & Related papers (2024-10-23T02:51:43Z) - SafeEmbodAI: a Safety Framework for Mobile Robots in Embodied AI Systems [5.055705635181593]
Embodied AI systems, including AI-powered robots that autonomously interact with the physical world, stand to be significantly advanced.
Improper safety management can lead to failures in complex environments and make the system vulnerable to malicious command injections.
We propose textitSafeEmbodAI, a safety framework for integrating mobile robots into embodied AI systems.
arXiv Detail & Related papers (2024-09-03T05:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.