Related papers: HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation

HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation

URL: http://arxiv.org/abs/2306.11377v2
Date: Mon, 29 Jul 2024 13:46:35 GMT
Title: HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation
Authors: An Dinh Vuong, Toan Tien Nguyen, Minh Nhat VU, Baoru Huang, Dzung Nguyen, Huynh Thi Thanh Binh, Thieu Vo, Anh Nguyen,
Abstract summary: We introduce HabiCrowd, the first standard benchmark for crowd-aware visual navigation. Our proposed human dynamics model achieves state-of-the-art performance in collision avoidance. We leverage HabiCrowd to conduct several comprehensive studies on crowd-aware visual navigation tasks and human-robot interactions.
Score: 8.484737966013059
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics have several limitations, particularly in terms of computational efficiency, which is a promise of E-AI simulators. To overcome these shortcomings, we introduce HabiCrowd, the first standard benchmark for crowd-aware visual navigation that integrates a crowd dynamics model with diverse human settings into photorealistic environments. Empirical evaluations demonstrate that our proposed human dynamics model achieves state-of-the-art performance in collision avoidance, while exhibiting superior computational efficiency compared to its counterparts. We leverage HabiCrowd to conduct several comprehensive studies on crowd-aware visual navigation tasks and human-robot interactions. The source code and data can be found at https://habicrowd.github.io/.

Related papers

Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations [61.235500325327585]
Existing AI benchmarks primarily assess verbal reasoning, neglecting the complexities of non-verbal, multi-step visual simulation.<n>We introduce STARE, a benchmark designed to rigorously evaluate multimodal large language models on tasks better solved through visual simulation.<n>Our evaluations show that models excel at reasoning over simpler 2D transformations, but perform close to random chance on more complex tasks.
arXiv Detail & Related papers (2025-06-05T05:09:46Z)
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard [63.54109142085327]
Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone. We introduce a unified Human-Aware VLN benchmark that merges these paradigms under explicit social-awareness constraints.
arXiv Detail & Related papers (2025-03-18T13:05:55Z)
AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans [2.940962519388297]
We propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN) AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles. We evaluate several baseline models on this task, analyze the unique challenges introduced by AdaVLN, and demonstrate its potential to bridge the sim-to-real gap in VLN research.
arXiv Detail & Related papers (2024-11-27T17:36:08Z)
OmniRe: Omni Urban Scene Reconstruction [78.99262488964423]
We introduce OmniRe, a comprehensive system for creating high-fidelity digital twins of dynamic real-world scenes from on-device logs. Our approach builds scene graphs on 3DGS and constructs multiple Gaussian representations in canonical spaces that model various dynamic actors.
arXiv Detail & Related papers (2024-08-29T17:56:33Z)
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions [69.9980759344628]
Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions. We introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activities. We present the Expert-Supervised Cross-Modal (VLN-CM) and Non-Expert-Supervised Decision Transformer (VLN-DT) agents, utilizing cross-modal fusion and diverse training strategies.
arXiv Detail & Related papers (2024-06-27T15:01:42Z)
Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation [10.201765067255147]
We investigate the feasibility of deploying reinforcement learning (RL) policies for constrained crowd navigation using a low-fidelity simulator. We introduce a representation of the dynamic environment, separating human and obstacle representations. This representation enables RL policies trained in a low-fidelity simulator to deploy in real world with a reduced sim2real gap.
arXiv Detail & Related papers (2024-05-27T04:53:09Z)
Learning Human-to-Robot Handovers from Point Clouds [63.18127198174958]
We propose the first framework to learn control policies for vision-based human-to-robot handovers. We show significant performance gains over baselines on a simulation benchmark, sim-to-sim transfer and sim-to-real transfer.
arXiv Detail & Related papers (2023-03-30T17:58:36Z)
Accelerating Interactive Human-like Manipulation Learning with GPU-based Simulation and High-quality Demonstrations [25.393382192511716]
We present an immersive virtual reality teleoperation interface designed for interactive human-like manipulation on contact rich tasks. We demonstrate the complementary strengths of massively parallel RL and imitation learning, yielding robust and natural behaviors.
arXiv Detail & Related papers (2022-12-05T09:37:27Z)
Open-VICO: An Open-Source Gazebo Toolkit for Multi-Camera-based Skeleton Tracking in Human-Robot Collaboration [0.0]
This work presents Open-VICO, an open-source toolkit to integrate virtual human models in Gazebo. In particular, Open-VICO allows to combine in the same simulation environment realistic human kinematic models, multi-camera vision setups, and human-tracking techniques.
arXiv Detail & Related papers (2022-03-28T13:21:32Z)
3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations. A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z)
iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects. iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation [75.0278287071591]
ThreeDWorld (TDW) is a platform for interactive multi-modal physical simulation. TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments. We present initial experiments enabled by TDW in emerging research directions in computer vision, machine learning, and cognitive science.
arXiv Detail & Related papers (2020-07-09T17:33:27Z)
Visual Navigation Among Humans with Optimal Control as a Supervisor [72.5188978268463]
We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans. Our approach is enabled by our novel data-generation tool, HumANav. We demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion.
arXiv Detail & Related papers (2020-03-20T16:13:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.