IRIS: An Immersive Robot Interaction System
- URL: http://arxiv.org/abs/2502.03297v3
- Date: Thu, 23 Oct 2025 08:59:52 GMT
- Title: IRIS: An Immersive Robot Interaction System
- Authors: Xinkai Jiang, Qihao Yuan, Enes Ulas Dincer, Hongyi Zhou, Ge Li, Xueyin Li, Xiaogang Jia, Timo Schnizer, Nicolas Schreiber, Weiran Liao, Julius Haag, Kailai Li, Gerhard Neumann, Rudolf Lioutikov,
- Abstract summary: IRIS supports immersive interaction and data collection across diverse simulators and real-world scenarios.<n>It visualizes arbitrary rigid and deformable objects, robots from simulation, and integrates real-time sensor-generated point clouds for real-world applications.
- Score: 29.868721218549993
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper introduces IRIS, an Immersive Robot Interaction System leveraging Extended Reality (XR). Existing XR-based systems enable efficient data collection but are often challenging to reproduce and reuse due to their specificity to particular robots, objects, simulators, and environments. IRIS addresses these issues by supporting immersive interaction and data collection across diverse simulators and real-world scenarios. It visualizes arbitrary rigid and deformable objects, robots from simulation, and integrates real-time sensor-generated point clouds for real-world applications. Additionally, IRIS enhances collaborative capabilities by enabling multiple users to simultaneously interact within the same virtual scene. Extensive experiments demonstrate that IRIS offers efficient and intuitive data collection in both simulated and real-world settings.
Related papers
- Real-Time Human-Robot Interaction Intent Detection Using RGB-based Pose and Emotion Cues with Cross-Camera Model Generalization [0.8839687029212673]
Service robots in public spaces require real-time understanding of human behavioral intentions for natural interaction.<n>We present a framework for frame-accurate human-robot interaction intent detection that fuses camera-invariant 2D skeletal pose and facial emotion features extracted from monocular RGB video.
arXiv Detail & Related papers (2025-12-18T08:44:22Z) - Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions [27.247431258140463]
We present a real-to-sim policy evaluation framework that constructs soft-body digital twins from real-world videos.<n>We validate our approach on representative deformable manipulation tasks, including plush toy packing, rope routing, and T-block pushing.
arXiv Detail & Related papers (2025-11-06T18:52:08Z) - R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation [74.41728218960465]
We propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data.<n>R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.
arXiv Detail & Related papers (2025-10-09T17:55:44Z) - RoboPearls: Editable Video Simulation for Robot Manipulation [81.18434338506621]
RoboPearls is an editable video simulation framework for robotic manipulation.<n>Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations.<n>We conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot.
arXiv Detail & Related papers (2025-06-28T05:03:31Z) - Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID.
This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging.
We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z) - An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation [13.15220962477623]
This paper introduces a novel Real-Sim-Real loop framework to address the gap between simulation and real-world conditions.<n>A key contribution of our work is the design of an informative cost function that encourages the collection of diverse and representative real-world data.<n>Our approach is implemented on the versatile Mujoco MJX platform, and our framework is compatible with a wide range of robotic systems.
arXiv Detail & Related papers (2025-03-13T07:27:05Z) - AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results [55.33807002543901]
We present AIvaluateXR, a comprehensive evaluation framework for benchmarking large language models (LLMs) running on XR devices.<n>We deploy 17 selected LLMs across four XR platforms: Magic Leap 2, Meta Quest 3, Vivo X100s Pro, and Apple Vision Pro, and conduct an extensive evaluation.<n>We propose a unified evaluation method based on the 3D Optimality theory to select the optimal device-model pairs from quality and speed objectives.
arXiv Detail & Related papers (2025-02-13T20:55:48Z) - Explainable XR: Understanding User Behaviors of XR Environments using LLM-assisted Analytics Framework [24.02808692450192]
We present Explainable XR, an end-to-end framework for analyzing user behavior in diverse XR environments.<n> Explainable XR addresses challenges in handling cross-virtuality - AR, VR, MR - transitions, multi-user collaborative application scenarios.
arXiv Detail & Related papers (2025-01-23T15:55:07Z) - Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation [51.20656279478878]
MATRIX is a multi-agent simulator that automatically generates diverse text-based scenarios.
We introduce MATRIX-Gen for controllable and highly realistic data synthesis.
On AlpacaEval 2 and Arena-Hard benchmarks, Llama-3-8B-Base, post-trained on datasets synthesized by MATRIX-Gen with just 20K instruction-response pairs, outperforms Meta's Llama-3-8B-Instruct model.
arXiv Detail & Related papers (2024-10-18T08:01:39Z) - XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis [84.23233209017192]
This paper presents a novel driving view synthesis dataset and benchmark specifically designed for autonomous driving simulations.
The dataset is unique as it includes testing images captured by deviating from the training trajectory by 1-4 meters.
We establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multi-camera settings.
arXiv Detail & Related papers (2024-06-26T14:00:21Z) - VBR: A Vision Benchmark in Rome [1.71787484850503]
This paper presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data.
We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision.
arXiv Detail & Related papers (2024-04-17T12:34:49Z) - RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications [55.24463002889]
We focus on depth data synthesis and develop a range-aware RGB-D data simulation pipeline (RaSim)
In particular, high-fidelity depth data is generated by imitating the imaging principle of real-world sensors.
RaSim can be directly applied to real-world scenarios without any finetuning and excel at downstream RGB-D perception tasks.
arXiv Detail & Related papers (2024-04-05T08:52:32Z) - Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks [47.07188762367792]
We present ARSim, a framework designed to enhance real multi-view image data with 3D synthetic objects of interest.
We construct a simplified virtual scene using real data and strategically place 3D synthetic assets within it.
The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles.
arXiv Detail & Related papers (2024-03-22T17:49:11Z) - Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy [37.061999275101904]
We argue for using large language models in XR by embedding them in avatars or as narratives to facilitate inclusion.
We speculate that combining the information provided to LLM-powered spaces by users and the biometric data obtained might lead to novel privacy invasions.
arXiv Detail & Related papers (2024-02-06T11:19:40Z) - Learning Interactive Real-World Simulators [96.5991333400566]
We explore the possibility of learning a universal simulator of real-world interaction through generative modeling.
We use the simulator to train both high-level vision-language policies and low-level reinforcement learning policies.
Video captioning models can benefit from training with simulated experience, opening up even wider applications.
arXiv Detail & Related papers (2023-10-09T19:42:22Z) - UniSim: A Neural Closed-Loop Sensor Simulator [76.79818601389992]
We present UniSim, a neural sensor simulator that takes a single recorded log captured by a sensor-equipped vehicle.
UniSim builds neural feature grids to reconstruct both the static background and dynamic actors in the scene.
We incorporate learnable priors for dynamic objects, and leverage a convolutional network to complete unseen regions.
arXiv Detail & Related papers (2023-08-03T17:56:06Z) - Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and
Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals.
We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z) - IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation
in Large-scale Dynamic Environments [0.0]
IBISCape is a simulated benchmark for high-fidelity SLAM systems.
We offer 34 multi-modal datasets suitable for autonomous vehicles navigation.
We evaluate four ORB-SLAM3 systems on various sequences collected in simulated large-scale dynamic environments.
arXiv Detail & Related papers (2022-06-27T17:04:06Z) - DriveGAN: Towards a Controllable High-Quality Neural Simulation [147.6822288981004]
We introduce a novel high-quality neural simulator referred to as DriveGAN.
DriveGAN achieves controllability by disentangling different components without supervision.
We train DriveGAN on multiple datasets, including 160 hours of real-world driving data.
arXiv Detail & Related papers (2021-04-30T15:30:05Z) - Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial
Observability in Visual Navigation [62.22058066456076]
Reinforcement Learning (RL) represents powerful tools to solve complex robotic tasks.
RL does not work directly in the real-world, which is known as the sim-to-real transfer problem.
We propose a method that learns on an observation space constructed by point clouds and environment randomization.
arXiv Detail & Related papers (2020-07-27T17:46:59Z) - RoboTHOR: An Open Simulation-to-Real Embodied AI Platform [56.50243383294621]
We introduce RoboTHOR to democratize research in interactive and embodied visual AI.
We show there exists a significant gap between the performance of models trained in simulation when they are tested in both simulations and their carefully constructed physical analogs.
arXiv Detail & Related papers (2020-04-14T20:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.