HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation
- URL: http://arxiv.org/abs/2306.11377v2
- Date: Mon, 29 Jul 2024 13:46:35 GMT
- Title: HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation
- Authors: An Dinh Vuong, Toan Tien Nguyen, Minh Nhat VU, Baoru Huang, Dzung Nguyen, Huynh Thi Thanh Binh, Thieu Vo, Anh Nguyen,
- Abstract summary: We introduce HabiCrowd, the first standard benchmark for crowd-aware visual navigation.
Our proposed human dynamics model achieves state-of-the-art performance in collision avoidance.
We leverage HabiCrowd to conduct several comprehensive studies on crowd-aware visual navigation tasks and human-robot interactions.
- Score: 8.484737966013059
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics have several limitations, particularly in terms of computational efficiency, which is a promise of E-AI simulators. To overcome these shortcomings, we introduce HabiCrowd, the first standard benchmark for crowd-aware visual navigation that integrates a crowd dynamics model with diverse human settings into photorealistic environments. Empirical evaluations demonstrate that our proposed human dynamics model achieves state-of-the-art performance in collision avoidance, while exhibiting superior computational efficiency compared to its counterparts. We leverage HabiCrowd to conduct several comprehensive studies on crowd-aware visual navigation tasks and human-robot interactions. The source code and data can be found at https://habicrowd.github.io/.
Related papers
- Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation [62.5805866419814]
Vid2Sim is a novel framework that bridges the sim2real gap through a scalable and cost-efficient real2sim pipeline for neural 3D scene reconstruction and simulation.
Experiments demonstrate that Vid2Sim significantly improves the performance of urban navigation in the digital twins and real world by 31.2% and 68.3% in success rate.
arXiv Detail & Related papers (2025-01-12T03:01:15Z) - Extrapolated Urban View Synthesis Benchmark [53.657271730352214]
Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs)
At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs.
Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes.
We have released our data to help advance self-driving and urban robotics simulation technology.
arXiv Detail & Related papers (2024-12-06T18:41:39Z) - Robot Learning with Super-Linear Scaling [20.730206708381704]
CASHER is a pipeline for scaling up data collection and learning in simulation where the performance scales superlinearly with human effort.
We show that CASHER enables fine-tuning of pre-trained policies to a target scenario using a video scan without any additional human effort.
arXiv Detail & Related papers (2024-12-02T18:12:02Z) - AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans [2.940962519388297]
We propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN)
AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles.
We evaluate several baseline models on this task, analyze the unique challenges introduced by AdaVLN, and demonstrate its potential to bridge the sim-to-real gap in VLN research.
arXiv Detail & Related papers (2024-11-27T17:36:08Z) - Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions [69.9980759344628]
Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions.
We introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activities.
We present the Expert-Supervised Cross-Modal (VLN-CM) and Non-Expert-Supervised Decision Transformer (VLN-DT) agents, utilizing cross-modal fusion and diverse training strategies.
arXiv Detail & Related papers (2024-06-27T15:01:42Z) - Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation [10.201765067255147]
We investigate the feasibility of deploying reinforcement learning (RL) policies for constrained crowd navigation using a low-fidelity simulator.
We introduce a representation of the dynamic environment, separating human and obstacle representations.
This representation enables RL policies trained in a low-fidelity simulator to deploy in real world with a reduced sim2real gap.
arXiv Detail & Related papers (2024-05-27T04:53:09Z) - Open-VICO: An Open-Source Gazebo Toolkit for Multi-Camera-based Skeleton
Tracking in Human-Robot Collaboration [0.0]
This work presents Open-VICO, an open-source toolkit to integrate virtual human models in Gazebo.
In particular, Open-VICO allows to combine in the same simulation environment realistic human kinematic models, multi-camera vision setups, and human-tracking techniques.
arXiv Detail & Related papers (2022-03-28T13:21:32Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z) - ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation [75.0278287071591]
ThreeDWorld (TDW) is a platform for interactive multi-modal physical simulation.
TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments.
We present initial experiments enabled by TDW in emerging research directions in computer vision, machine learning, and cognitive science.
arXiv Detail & Related papers (2020-07-09T17:33:27Z) - Visual Navigation Among Humans with Optimal Control as a Supervisor [72.5188978268463]
We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans.
Our approach is enabled by our novel data-generation tool, HumANav.
We demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion.
arXiv Detail & Related papers (2020-03-20T16:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.