Related papers: XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots

XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots

URL: http://arxiv.org/abs/2512.05270v1
Date: Thu, 04 Dec 2025 21:49:14 GMT
Title: XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots
Authors: Tianyi Wang, Jiseop Byeon, Ahmad Yehia, Huihai Wang, Yiming Xu, Tianyi Zeng, Ziran Wang, Junfeng Jiao, Christian Claudel,
Abstract summary: This paper presents XR-DT, an eXtended Reality-enhanced Digital Twin framework for agentic mobile robots.<n>By embedding human intention, environmental dynamics, and robot cognition into the XR-DT framework, our system enables interpretable, trustworthy, and adaptive HRI.
Score: 10.083050242188422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While substantial progress has been devoted to human behavior prediction, limited attention has been paid to how humans perceive, interpret, and trust robots' inferences, impeding deployment in safety-critical and socially embedded environments. This paper presents XR-DT, an eXtended Reality-enhanced Digital Twin framework for agentic mobile robots, that bridges physical and virtual spaces to enable bi-directional understanding between humans and robots. Our hierarchical XR-DT architecture integrates virtual-, augmented-, and mixed-reality layers, fusing real-time sensor data, simulated environments in the Unity game engine, and human feedback captured through wearable AR devices. Within this framework, we design an agentic mobile robot system with a unified diffusion policy for context-aware task adaptation. We further propose a chain-of-thought prompting mechanism that allows multimodal large language models to reason over human instructions and environmental context, while leveraging an AutoGen-based multi-agent coordination layer to enhance robustness and collaboration in dynamic tasks. Initial experimental results demonstrate accurate human and robot trajectory prediction, validating the XR-DT framework's effectiveness in HRI tasks. By embedding human intention, environmental dynamics, and robot cognition into the XR-DT framework, our system enables interpretable, trustworthy, and adaptive HRI.

Related papers

HHI-Assist: A Dataset and Benchmark of Human-Human Interaction in Physical Assistance Scenario [63.77482302352545]
HHI-Assist is a dataset comprising motion capture clips of human-human interactions in assistive tasks.<n>Our work has the potential to significantly enhance robotic assistance policies.
arXiv Detail & Related papers (2025-09-12T09:38:17Z)
Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis [51.95817740348585]
Human-X is a novel framework designed to enable immersive and physically plausible human interactions across diverse entities.<n>Our method jointly predicts actions and reactions in real-time using an auto-regressive reaction diffusion planner.<n>Our framework is validated in real-world applications, including virtual reality interface for human-robot interaction.
arXiv Detail & Related papers (2025-08-04T06:35:48Z)
Recognizing Actions from Robotic View for Natural Human-Robot Interaction [52.00935005918032]
Natural Human-Robot Interaction (N-HRI) requires robots to recognize human actions at varying distances and states, regardless of whether the robot itself is in motion or stationary.<n>Existing benchmarks for N-HRI fail to address the unique complexities in N-HRI due to limited data, modalities, task categories, and diversity of subjects and environments.<n>We introduce (Action from Robotic View) a large-scale dataset for perception-centric robotic views prevalent in mobile service robots.
arXiv Detail & Related papers (2025-07-30T09:48:34Z)
GNN-based Decentralized Perception in Multirobot Systems for Predicting Worker Actions [12.260881600042374]
This paper introduces a perception framework that enables mobile robots to understand and share information about human actions in a decentralized way.<n>A swarm-inspired decision-making process is used to ensure all robots agree on a unified interpretation of the human's actions.
arXiv Detail & Related papers (2025-01-08T00:06:38Z)
Experimental Evaluation of ROS-Causal in Real-World Human-Robot Spatial Interaction Scenarios [3.8625803348911774]
We present an experimental evaluation of ROS-Causal, a ROS-based framework for causal discovery in human-robot spatial interactions. We show how causal models can be extracted directly onboard by robots during data collection. The online causal models generated from the simulation are consistent with those from lab experiments.
arXiv Detail & Related papers (2024-06-07T14:20:30Z)
Robot Interaction Behavior Generation based on Social Motion Forecasting for Human-Robot Interaction [9.806227900768926]
We propose to model social motion forecasting in a shared human-robot representation space. ECHO operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios. We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2024-02-07T11:37:14Z)
Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data. We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z)
Semantic-Aware Environment Perception for Mobile Human-Robot Interaction [2.309914459672557]
We present a vision-based system for mobile robots to enable a semantic-aware environment without additional a-priori knowledge. We deploy our system on a mobile humanoid robot that enables us to test our methods in real-world applications.
arXiv Detail & Related papers (2022-11-07T08:49:45Z)
Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration. We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions. The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z)
Regularized Deep Signed Distance Fields for Reactive Motion Generation [30.792481441975585]
Distance-based constraints are fundamental for enabling robots to plan their actions and act safely. We propose Regularized Deep Signed Distance Fields (ReDSDF), a single neural implicit function that can compute smooth distance fields at any scale. We demonstrate the effectiveness of our approach in representative simulated tasks for whole-body control (WBC) and safe Human-Robot Interaction (HRI) in shared workspaces.
arXiv Detail & Related papers (2022-03-09T14:21:32Z)
Spatial Computing and Intuitive Interaction: Bringing Mixed Reality and Robotics Together [68.44697646919515]
This paper presents several human-robot systems that utilize spatial computing to enable novel robot use cases. The combination of spatial computing and egocentric sensing on mixed reality devices enables them to capture and understand human actions and translate these to actions with spatial meaning.
arXiv Detail & Related papers (2022-02-03T10:04:26Z)
HARPS: An Online POMDP Framework for Human-Assisted Robotic Planning and Sensing [1.3678064890824186]
The Human Assisted Robotic Planning and Sensing (HARPS) framework is presented for active semantic sensing and planning in human-robot teams. This approach lets humans opportunistically impose model structure and extend the range of semantic soft data in uncertain environments. Simulations of a UAV-enabled target search application in a large-scale partially structured environment show significant improvements in time and belief state estimates.
arXiv Detail & Related papers (2021-10-20T00:41:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.