When Generative AI Meets Extended Reality: Enabling Scalable and Natural Interactions
- URL: http://arxiv.org/abs/2601.15308v1
- Date: Tue, 13 Jan 2026 15:21:08 GMT
- Title: When Generative AI Meets Extended Reality: Enabling Scalable and Natural Interactions
- Authors: Mingyu Zhu, Jiangong Chen, Bin Li,
- Abstract summary: Generative AI (GenAI) enables intuitive, language-driven interaction and automating content generation.<n>This paper explores the integration of XR and GenAI through three concrete use cases, showing how they address key obstacles in scalability and natural interaction.
- Score: 8.808170696228865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extended Reality (XR), including virtual, augmented, and mixed reality, provides immersive and interactive experiences across diverse applications, from VR-based education to AR-based assistance and MR-based training. However, widespread XR adoption remains limited due to two key challenges: 1) the high cost and complexity of authoring 3D content, especially for large-scale environments or complex interactions; and 2) the steep learning curve associated with non-intuitive interaction methods like handheld controllers or scripted gestures. Generative AI (GenAI) presents a promising solution by enabling intuitive, language-driven interaction and automating content generation. Leveraging vision-language models and diffusion-based generation, GenAI can interpret ambiguous instructions, understand physical scenes, and generate or manipulate 3D content, significantly lowering barriers to XR adoption. This paper explores the integration of XR and GenAI through three concrete use cases, showing how they address key obstacles in scalability and natural interaction, and identifying technical challenges that must be resolved to enable broader adoption.
Related papers
- Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z) - Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction [117.6380005194061]
We introduce a method designed to systematically scale the diversity and complexity of interactive environments.<n>Our method realizes this scaling by addressing three dimensions.<n>We train Nex-N1 upon the diverse and complex interactive environments established by our infrastructure.
arXiv Detail & Related papers (2025-12-04T16:57:02Z) - From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z) - Recent Advances and Future Directions in Extended Reality (XR): Exploring AI-Powered Spatial Intelligence [0.0]
Extended Reality (XR), encompassing Augmented Reality (AR), Virtual Reality (VR) and Mixed Reality (MR), is a transformative technology bridging the physical and virtual world.<n>This review examines XR's evolution through foundational framework - hardware ranging from monitors to sensors and software ranging from visual tasks to user interface.<n>For future directions, attention should be given to the integration of multi-modal AI and IoT-driven digital twins to enable adaptive XR systems.
arXiv Detail & Related papers (2025-04-22T15:11:55Z) - From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality [0.7388329684634598]
Matrix is an advanced AI-powered framework designed for real-time 3D object generation in Augmented Reality (AR) environments.<n>By integrating a cutting-edge text-to-3D generative AI model, multilingual speech-to-text translation, and large language models, the system enables seamless user interactions through spoken commands.
arXiv Detail & Related papers (2025-03-04T06:31:51Z) - Cognitive Assessment and Training in Extended Reality: Multimodal Systems, Clinical Utility, and Current Challenges [0.9831489366502301]
Extended reality (XR) technologies are transforming cognitive assessment and training by offering immersive, interactive environments that simulate real-world tasks.<n> XR enhances ecological validity while enabling real-time, multimodal data collection through tools such as galvanic skin response (GSR), electroencephalography (EEG), eye tracking (ET), hand tracking, and body tracking.
arXiv Detail & Related papers (2025-01-14T16:22:36Z) - Grounded GUI Understanding for Vision-Based Spatial Intelligent Agent: Exemplified by Extended Reality Apps [39.56688889845037]
We propose the first zero-shot cOntext-sensitive inteRactable GUI ElemeNT dEtection framework for virtual Reality apps, named Orienter.<n>By imitating human behaviors, Orienter observes and understands the semantic contexts of XR app scenes first, before performing the detection.
arXiv Detail & Related papers (2024-09-17T00:58:00Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models.
VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.