Toward Safe, Trustworthy and Realistic Augmented Reality User Experience
- URL: http://arxiv.org/abs/2507.23226v1
- Date: Thu, 31 Jul 2025 03:42:52 GMT
- Title: Toward Safe, Trustworthy and Realistic Augmented Reality User Experience
- Authors: Yanming Xiu,
- Abstract summary: Our research addresses the risks of task-detrimental AR content, particularly that which obstructs critical information or subtly manipulates user perception.<n>We developed two systems, ViDDAR and VIM-Sense, to detect such attacks using vision-language models (VLMs) and multimodal reasoning modules.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As augmented reality (AR) becomes increasingly integrated into everyday life, ensuring the safety and trustworthiness of its virtual content is critical. Our research addresses the risks of task-detrimental AR content, particularly that which obstructs critical information or subtly manipulates user perception. We developed two systems, ViDDAR and VIM-Sense, to detect such attacks using vision-language models (VLMs) and multimodal reasoning modules. Building on this foundation, we propose three future directions: automated, perceptually aligned quality assessment of virtual content; detection of multimodal attacks; and adaptation of VLMs for efficient and user-centered deployment on AR devices. Overall, our work aims to establish a scalable, human-aligned framework for safeguarding AR experiences and seeks feedback on perceptual modeling, multimodal AR content implementation, and lightweight model adaptation.
Related papers
- How LLMs are Shaping the Future of Virtual Reality [2.4150871564195007]
The integration of Large Language Models (LLMs) into Virtual Reality (VR) games marks a paradigm shift in the design of immersive, adaptive, and intelligent digital experiences.<n>This paper examines how these models are transforming narrative generation, non-player character (NPC) interactions, accessibility, personalization, and game mastering.
arXiv Detail & Related papers (2025-08-01T16:08:05Z) - From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z) - Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding [59.75428247670665]
This study introduces a benchmark evaluating videoLLMs across five dimensions: truthfulness, safety, fairness, and privacy.<n>Our evaluation of 23 state-of-the-art videoLLMs reveals significant limitations in dynamic visual scene understanding and cross-modal resilience.
arXiv Detail & Related papers (2025-06-14T04:04:54Z) - Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion [56.566914768257035]
We present Adversarial Object Fusion (AdvOF), a novel attack framework targeting vision-and-language navigation (VLN) agents in service-oriented environments.<n>We show AdvOF can effectively degrade agent performance under adversarial conditions while maintaining minimal interference with normal navigation tasks.<n>This work advances the understanding of service security in VLM-powered navigation systems, providing computational foundations for robust service composition in physical-world deployments.
arXiv Detail & Related papers (2025-05-29T09:14:50Z) - ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality [2.1506382989223782]
ViDDAR is a comprehensive full-reference system to monitor and evaluate virtual content in Augmented Reality environments.<n>To the best of our knowledge, ViDDAR is the first system to employ Vision Language Models (VLMs) for detecting task-detrimental content in AR settings.
arXiv Detail & Related papers (2025-01-22T00:17:08Z) - Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble [3.481985817302898]
We evaluate the capabilities of three state-of-the-art commercial Vision-Language Models (VLMs) in identifying and describing AR scenes.<n>Our findings demonstrate that VLMs are generally capable of perceiving and describing AR scenes.<n>We identify key factors affecting VLM performance, including virtual content placement, rendering quality, and physical plausibility.
arXiv Detail & Related papers (2025-01-21T23:07:03Z) - Enhancing Content Representation for AR Image Quality Assessment Using Knowledge Distillation [3.020452010930984]
This paper presents a deep learning-based objective metric designed specifically for assessing image quality for Augmented Reality scenarios.<n>It entails four key steps, (1) fine-tuning a self-supervised pre-trained vision transformer to extract prominent features from reference images, (2) quantifying distortions by computing shift representations, (3) employing cross-attention-based decoders to capture perceptual quality features, and (4) integrating regularization techniques and label smoothing to address the overfitting problem.
arXiv Detail & Related papers (2024-12-08T17:25:30Z) - "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z) - Towards Ubiquitous Semantic Metaverse: Challenges, Approaches, and
Opportunities [68.03971716740823]
In recent years, ubiquitous semantic Metaverse has been studied to revolutionize immersive cyber-virtual experiences for augmented reality (AR) and virtual reality (VR) users.
This survey focuses on the representation and intelligence for the four fundamental system components in ubiquitous Metaverse.
arXiv Detail & Related papers (2023-07-13T11:14:46Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - Joint Sensing, Communication, and AI: A Trifecta for Resilient THz User
Experiences [118.91584633024907]
A novel joint sensing, communication, and artificial intelligence (AI) framework is proposed so as to optimize extended reality (XR) experiences over terahertz (THz) wireless systems.
arXiv Detail & Related papers (2023-04-29T00:39:50Z) - Building Trust in Autonomous Vehicles: Role of Virtual Reality Driving
Simulators in HMI Design [8.39368916644651]
We propose a methodology to validate the user experience in AVs based on continuous, objective information gathered from physiological signals.
We applied this methodology to the design of a head-up display interface delivering visual cues about the vehicle's sensory and planning systems.
arXiv Detail & Related papers (2020-07-27T08:42:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.