Related papers: InCrowd-VI: A Realistic Visual-Inertial Dataset for Evaluating SLAM in Indoor Pedestrian-Rich Spaces for Human Navigation

InCrowd-VI: A Realistic Visual-Inertial Dataset for Evaluating SLAM in Indoor Pedestrian-Rich Spaces for Human Navigation

URL: http://arxiv.org/abs/2411.14358v1
Date: Thu, 21 Nov 2024 17:58:07 GMT
Title: InCrowd-VI: A Realistic Visual-Inertial Dataset for Evaluating SLAM in Indoor Pedestrian-Rich Spaces for Human Navigation
Authors: Marziyeh Bamdad, Hans-Peter Hutter, Alireza Darvishy,
Abstract summary: We introduce InCrowd-VI, a novel visual-inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 hours of recording time, including RGB, stereo images, and IMU measurements. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service.
Score: 2.184775414778289
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual-inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Project glasses, it captures realistic scenarios without environmental control. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 hours of recording time, including RGB, stereo images, and IMU measurements. The dataset captures important challenges such as pedestrian occlusions, varying crowd densities, complex layouts, and lighting changes. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service. In addition, a semi-dense 3D point cloud of scenes is provided for each sequence. The evaluation of state-of-the-art visual odometry (VO) and SLAM algorithms on InCrowd-VI revealed severe performance limitations in these realistic scenarios, demonstrating the need and value of the new dataset to advance SLAM research for visually impaired navigation in complex indoor environments.

Related papers

NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
eStonefish-scenes: A synthetically generated dataset for underwater event-based optical flow prediction tasks [0.0]
We introduce eStonefish-scenes, a synthetic event-based optical flow dataset based on the Stonefish simulator.<n>Along with the dataset, we present a data generation pipeline that enables the creation of customizable underwater environments.
arXiv Detail & Related papers (2025-05-19T16:26:18Z)
JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data [49.2298619289506]
We propose a plug-and-play method called JiSAM, shorthand for Jittering augmentation, domain-aware backbone and memory-based Sectorized AlignMent. In extensive experiments conducted on the famous AD dataset NuScenes, we demonstrate that, with SOTA 3D object detector, JiSAM is able to utilize the simulation data and only labels on 2.5% available real data to achieve comparable performance to models trained on all real data.
arXiv Detail & Related papers (2025-03-11T13:35:39Z)
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment [59.320414108383055]
We present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation. We propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses.
arXiv Detail & Related papers (2024-02-27T03:08:44Z)
UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization [14.87295056434887]
We introduce a large-scale 6-DoF UAV dataset for localization (UAVD4L) We develop a two-stage 6-DoF localization pipeline (UAVLoc), which consists of offline synthetic data generation and online visual localization. Results on the new dataset demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-01-11T15:19:21Z)
Amirkabir campus dataset: Real-world challenges and scenarios of Visual Inertial Odometry (VIO) for visually impaired people [3.7998592843098336]
We introduce the Amirkabir campus dataset (AUT-VI) to address the mentioned problem and improve the navigation systems. AUT-VI is a novel and super-challenging dataset with 126 diverse sequences in 17 different locations. In support of ongoing development efforts, we have released the Android application for data capture to the public.
arXiv Detail & Related papers (2024-01-07T23:13:51Z)
Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement [70.2429155741593]
This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT) It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles. We propose a novel underwater image enhancement algorithm designed specifically to boost tracking quality. The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers.
arXiv Detail & Related papers (2023-08-30T07:41:26Z)
Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning. We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z)
FLSea: Underwater Visual-Inertial and Stereo-Vision Forward-Looking Datasets [8.830479021890575]
We have collected underwater forward-looking stereo-vision and visual-inertial image sets in the Mediterranean and Red Sea. These datasets are critical for the development of several underwater applications, including obstacle avoidance, visual odometry, 3D tracking, Simultaneous localization and Mapping (SLAM) and depth estimation.
arXiv Detail & Related papers (2023-02-24T17:39:53Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather [92.84066576636914]
This work addresses the challenging task of LiDAR-based 3D object detection in foggy weather. We tackle this problem by simulating physically accurate fog into clear-weather scenes. We are the first to provide strong 3D object detection baselines on the Seeing Through Fog dataset.
arXiv Detail & Related papers (2021-08-11T14:37:54Z)
Scalable Scene Flow from Point Clouds in the Real World [30.437100097997245]
We introduce a new large scale benchmark for scene flow based on the Open dataset. We show how previous works were bounded based on the amount of real LiDAR data available. We introduce the model architecture FastFlow3D that provides real time inference on the full point cloud.
arXiv Detail & Related papers (2021-03-01T20:56:05Z)
A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones. To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed. To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z)
Exploring the Impacts from Datasets to Monocular Depth Estimation (MDE) Models with MineNavi [5.689127984415125]
Current computer vision tasks based on deep learning require a huge amount of data with annotations for model training or testing. In practice, manual labeling for dense estimation tasks is very difficult or even impossible, and the scenes of the dataset are often restricted to a small range. We propose a synthetic dataset generation method to obtain the expandable dataset without burdensome manual workforce.
arXiv Detail & Related papers (2020-08-19T14:03:17Z)
Transferable Active Grasping and Real Embodied Dataset [48.887567134129306]
We show how to search for feasible viewpoints for grasping by the use of hand-mounted RGB-D cameras. A practical 3-stage transferable active grasping pipeline is developed, that is adaptive to unseen clutter scenes. In our pipeline, we propose a novel mask-guided reward to overcome the sparse reward issue in grasping and ensure category-irrelevant behavior.
arXiv Detail & Related papers (2020-04-28T08:15:35Z)
Deep Learning based Pedestrian Inertial Navigation: Methods, Dataset and On-Device Inference [49.88536971774444]
Inertial measurements units (IMUs) are small, cheap, energy efficient, and widely employed in smart devices and mobile robots. Exploiting inertial data for accurate and reliable pedestrian navigation supports is a key component for emerging Internet-of-Things applications and services. We present and release the Oxford Inertial Odometry dataset (OxIOD), a first-of-its-kind public dataset for deep learning based inertial navigation research.
arXiv Detail & Related papers (2020-01-13T04:41:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.