IRIS: An Immersive Robot Interaction System
- URL: http://arxiv.org/abs/2502.03297v2
- Date: Mon, 17 Feb 2025 11:42:24 GMT
- Title: IRIS: An Immersive Robot Interaction System
- Authors: Xinkai Jiang, Qihao Yuan, Enes Ulas Dincer, Hongyi Zhou, Ge Li, Xueyin Li, Julius Haag, Nicolas Schreiber, Kailai Li, Gerhard Neumann, Rudolf Lioutikov,
- Abstract summary: IRIS is a novel, easily extendable framework that already supports multiple simulators, benchmarks, and even headsets.<n>A unified scene specification is generated directly from simulators or real-world sensors and transmitted to XR headsets, creating identical scenes in XR.<n>IRIS can be deployed on any device that supports the Unity Framework, encompassing the vast majority of commercially available headsets.
- Score: 21.524791747174188
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper introduces IRIS, an immersive Robot Interaction System leveraging Extended Reality (XR), designed for robot data collection and interaction across multiple simulators, benchmarks, and real-world scenarios. While existing XR-based data collection systems provide efficient and intuitive solutions for large-scale data collection, they are often challenging to reproduce and reuse. This limitation arises because current systems are highly tailored to simulator-specific use cases and environments. IRIS is a novel, easily extendable framework that already supports multiple simulators, benchmarks, and even headsets. Furthermore, IRIS is able to include additional information from real-world sensors, such as point clouds captured through depth cameras. A unified scene specification is generated directly from simulators or real-world sensors and transmitted to XR headsets, creating identical scenes in XR. This specification allows IRIS to support any of the objects, assets, and robots provided by the simulators. In addition, IRIS introduces shared spatial anchors and a robust communication protocol that links simulations between multiple XR headsets. This feature enables multiple XR headsets to share a synchronized scene, facilitating collaborative and multi-user data collection. IRIS can be deployed on any device that supports the Unity Framework, encompassing the vast majority of commercially available headsets. In this work, IRIS was deployed and tested on the Meta Quest 3 and the HoloLens 2. IRIS showcased its versatility across a wide range of real-world and simulated scenarios, using current popular robot simulators such as MuJoCo, IsaacSim, CoppeliaSim, and Genesis. In addition, a user study evaluates IRIS on a data collection task for the LIBERO benchmark. The study shows that IRIS significantly outperforms the baseline in both objective and subjective metrics.
Related papers
- Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID.
This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging.
We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z) - Explainable XR: Understanding User Behaviors of XR Environments using LLM-assisted Analytics Framework [24.02808692450192]
We present Explainable XR, an end-to-end framework for analyzing user behavior in diverse XR environments.<n> Explainable XR addresses challenges in handling cross-virtuality - AR, VR, MR - transitions, multi-user collaborative application scenarios.
arXiv Detail & Related papers (2025-01-23T15:55:07Z) - Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation [51.20656279478878]
MATRIX is a multi-agent simulator that automatically generates diverse text-based scenarios.
We introduce MATRIX-Gen for controllable and highly realistic data synthesis.
On AlpacaEval 2 and Arena-Hard benchmarks, Llama-3-8B-Base, post-trained on datasets synthesized by MATRIX-Gen with just 20K instruction-response pairs, outperforms Meta's Llama-3-8B-Instruct model.
arXiv Detail & Related papers (2024-10-18T08:01:39Z) - XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis [84.23233209017192]
This paper presents a novel driving view synthesis dataset and benchmark specifically designed for autonomous driving simulations.
The dataset is unique as it includes testing images captured by deviating from the training trajectory by 1-4 meters.
We establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multi-camera settings.
arXiv Detail & Related papers (2024-06-26T14:00:21Z) - VBR: A Vision Benchmark in Rome [1.71787484850503]
This paper presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data.
We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision.
arXiv Detail & Related papers (2024-04-17T12:34:49Z) - Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy [37.061999275101904]
We argue for using large language models in XR by embedding them in avatars or as narratives to facilitate inclusion.
We speculate that combining the information provided to LLM-powered spaces by users and the biometric data obtained might lead to novel privacy invasions.
arXiv Detail & Related papers (2024-02-06T11:19:40Z) - UniSim: A Neural Closed-Loop Sensor Simulator [76.79818601389992]
We present UniSim, a neural sensor simulator that takes a single recorded log captured by a sensor-equipped vehicle.
UniSim builds neural feature grids to reconstruct both the static background and dynamic actors in the scene.
We incorporate learnable priors for dynamic objects, and leverage a convolutional network to complete unseen regions.
arXiv Detail & Related papers (2023-08-03T17:56:06Z) - Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and
Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals.
We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z) - IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation
in Large-scale Dynamic Environments [0.0]
IBISCape is a simulated benchmark for high-fidelity SLAM systems.
We offer 34 multi-modal datasets suitable for autonomous vehicles navigation.
We evaluate four ORB-SLAM3 systems on various sequences collected in simulated large-scale dynamic environments.
arXiv Detail & Related papers (2022-06-27T17:04:06Z) - DriveGAN: Towards a Controllable High-Quality Neural Simulation [147.6822288981004]
We introduce a novel high-quality neural simulator referred to as DriveGAN.
DriveGAN achieves controllability by disentangling different components without supervision.
We train DriveGAN on multiple datasets, including 160 hours of real-world driving data.
arXiv Detail & Related papers (2021-04-30T15:30:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.