The Monado SLAM Dataset for Egocentric Visual-Inertial Tracking
- URL: http://arxiv.org/abs/2508.00088v1
- Date: Thu, 31 Jul 2025 18:28:07 GMT
- Title: The Monado SLAM Dataset for Egocentric Visual-Inertial Tracking
- Authors: Mateo de Mayo, Daniel Cremers, TaihĂș Pire,
- Abstract summary: Humanoid robots and mixed reality headsets benefit from the use of head-mounted sensors for tracking.<n>We show that state-of-the-art tracking systems are still unable to gracefully handle many of the challenging settings presented in head-mounted use cases.<n>We present the Monado SLAM dataset, a set of real sequences taken from multiple virtual reality headsets.
- Score: 38.93284476165776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humanoid robots and mixed reality headsets benefit from the use of head-mounted sensors for tracking. While advancements in visual-inertial odometry (VIO) and simultaneous localization and mapping (SLAM) have produced new and high-quality state-of-the-art tracking systems, we show that these are still unable to gracefully handle many of the challenging settings presented in the head-mounted use cases. Common scenarios like high-intensity motions, dynamic occlusions, long tracking sessions, low-textured areas, adverse lighting conditions, saturation of sensors, to name a few, continue to be covered poorly by existing datasets in the literature. In this way, systems may inadvertently overlook these essential real-world issues. To address this, we present the Monado SLAM dataset, a set of real sequences taken from multiple virtual reality headsets. We release the dataset under a permissive CC BY 4.0 license, to drive advancements in VIO/SLAM research and development.
Related papers
- Benchmarking Egocentric Visual-Inertial SLAM at City Scale [50.1245744173948]
This paper introduces a new dataset and benchmark for visual-inertial SLAM with egocentric, multi-modal data.<n>We record hours and kilometers of trajectories through a city center with glasses-like devices equipped with various sensors.<n>We show that state-of-the-art systems developed by academia are not robust to these challenges and we identify components that are responsible for this.
arXiv Detail & Related papers (2025-09-30T17:59:31Z) - How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection [0.0]
Event cameras are well-suited for real-time object detection at traffic intersections.<n>The development of robust event-based detection models is hindered by the limited availability of annotated real-world datasets.<n>This study offers the first quantifiable analysis of the sim-to-real gap in event-based object detection using CARLAs DVS.
arXiv Detail & Related papers (2025-06-16T17:27:43Z) - SLAM&Render: A Benchmark for the Intersection Between Neural Rendering, Gaussian Splatting and SLAM [12.378998250852383]
SLAM&Render is a novel dataset designed to benchmark methods in the intersection between SLAM, Novel View Rendering and Gaussian Splatting.<n>It uniquely includes 40 sequences with time-synchronized RGB-D images, IMU readings, robot kinematic data, and ground-truth pose streams.<n>By releasing robot kinematic data, the dataset also enables the assessment of recent integrations of SLAM paradigms within robotic applications.
arXiv Detail & Related papers (2025-04-18T14:28:34Z) - Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID.<n>This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging.<n>We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z) - ROVER: A Multi-Season Dataset for Visual SLAM [7.296917102476635]
ROVER is a benchmark dataset for evaluating visual SLAM algorithms in diverse environmental conditions.<n>It covers 39 recordings across five outdoor locations, collected through all seasons and various lighting scenarios.<n>Results show that while stereo-inertial and RGBD configurations perform better under favorable lighting, most SLAM systems perform poorly in low-light and high-vegetation scenarios.
arXiv Detail & Related papers (2024-12-03T15:34:00Z) - InCrowd-VI: A Realistic Visual-Inertial Dataset for Evaluating SLAM in Indoor Pedestrian-Rich Spaces for Human Navigation [2.184775414778289]
InCrowd-VI is a visual-inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments.<n>It features 58 sequences totaling a 5 km trajectory length and 1.5 hours of recording time, including RGB, stereo images, and IMU measurements.<n>Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service.
arXiv Detail & Related papers (2024-11-21T17:58:07Z) - LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free
Environment [59.320414108383055]
We present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation.
We propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses.
arXiv Detail & Related papers (2024-02-27T03:08:44Z) - Amirkabir campus dataset: Real-world challenges and scenarios of Visual
Inertial Odometry (VIO) for visually impaired people [3.7998592843098336]
We introduce the Amirkabir campus dataset (AUT-VI) to address the mentioned problem and improve the navigation systems.
AUT-VI is a novel and super-challenging dataset with 126 diverse sequences in 17 different locations.
In support of ongoing development efforts, we have released the Android application for data capture to the public.
arXiv Detail & Related papers (2024-01-07T23:13:51Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - Aria-NeRF: Multimodal Egocentric View Synthesis [17.0554791846124]
We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs)
This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, and information from dual-frequency IMU datasets (1kHz and 800Hz)
The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in
arXiv Detail & Related papers (2023-11-11T01:56:35Z) - Learning to Simulate Realistic LiDARs [66.7519667383175]
We introduce a pipeline for data-driven simulation of a realistic LiDAR sensor.
We show that our model can learn to encode realistic effects such as dropped points on transparent surfaces.
We use our technique to learn models of two distinct LiDAR sensors and use them to improve simulated LiDAR data accordingly.
arXiv Detail & Related papers (2022-09-22T13:12:54Z) - The Hilti SLAM Challenge Dataset [41.091844019181735]
Construction environments pose challenging problem to Simultaneous Localization and Mapping (SLAM) algorithms.
To help this research, we propose a new dataset, the Hilti SLAM Challenge dataset.
Each dataset includes accurate ground truth to allow direct testing of SLAM results.
arXiv Detail & Related papers (2021-09-23T12:02:40Z) - Transferable Active Grasping and Real Embodied Dataset [48.887567134129306]
We show how to search for feasible viewpoints for grasping by the use of hand-mounted RGB-D cameras.
A practical 3-stage transferable active grasping pipeline is developed, that is adaptive to unseen clutter scenes.
In our pipeline, we propose a novel mask-guided reward to overcome the sparse reward issue in grasping and ensure category-irrelevant behavior.
arXiv Detail & Related papers (2020-04-28T08:15:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.