Related papers: IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation

IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation

URL: http://arxiv.org/abs/2511.17384v1
Date: Fri, 21 Nov 2025 16:48:49 GMT
Title: IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation
Authors: Yifan Li, Lichi Li, Anh Dao, Xinyu Zhou, Yicheng Qiao, Zheda Mai, Daeun Lee, Zichen Chen, Zhen Tan, Mohit Bansal, Yu Kong,
Abstract summary: IndustryNav is the first dynamic industrial navigation benchmark for active spatial reasoning.<n>A study of nine state-of-the-art Visual Large Language Models reveals that closed-source models maintain a consistent advantage.
Score: 56.43007596544299
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: While Visual Large Language Models (VLLMs) show great promise as embodied agents, they continue to face substantial challenges in spatial reasoning. Existing embodied benchmarks largely focus on passive, static household environments and evaluate only isolated capabilities, failing to capture holistic performance in dynamic, real-world complexity. To fill this gap, we present IndustryNav, the first dynamic industrial navigation benchmark for active spatial reasoning. IndustryNav leverages 12 manually created, high-fidelity Unity warehouse scenarios featuring dynamic objects and human movement. Our evaluation employs a PointGoal navigation pipeline that effectively combines egocentric vision with global odometry to assess holistic local-global planning. Crucially, we introduce the "collision rate" and "warning rate" metrics to measure safety-oriented behaviors and distance estimation. A comprehensive study of nine state-of-the-art VLLMs (including models such as GPT-5-mini, Claude-4.5, and Gemini-2.5) reveals that closed-source models maintain a consistent advantage; however, all agents exhibit notable deficiencies in robust path planning, collision avoidance and active exploration. This highlights a critical need for embodied research to move beyond passive perception and toward tasks that demand stable planning, active exploration, and safe behavior in dynamic, real-world environment.

Related papers

WaterVideoQA: ASV-Centric Perception and Rule-Compliant Reasoning via Multi-Modal Agents [23.828845891763617]
We present WaterVideoQA, the first large-scale, comprehensive Video Question Answering benchmark specifically engineered for all-waterway environments.<n>We also introduce NaviMind, a pioneering multi-agent neuro-symbolic system designed for open-ended maritime reasoning.
arXiv Detail & Related papers (2026-02-26T12:12:40Z)
EscherVerse: An Open World Benchmark and Dataset for Teleo-Spatial Intelligence with Physical-Dynamic and Intent-Driven Understanding [56.89359230139883]
We introduce Teleo-Spatial Intelligence (TSI), a new paradigm that unifies two critical pillars: Physical-Dynamic Reasoning and Intent-Driven Reasoning.<n>We present EscherVerse, consisting of a large-scale, open-world benchmark (Escher-Bench), a dataset (Escher-35k), and models (Escher series)<n>It is the first benchmark to systematically assess Intent-Driven Reasoning, challenging models to connect physical events to their underlying human purposes.
arXiv Detail & Related papers (2026-01-04T14:42:39Z)
Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation [0.0]
Hybrid Motion Planning with Deep Reinforcement Learning (HMP-DRL)<n>We propose a graph-based global planner to generate a path, which is integrated into a local DRL policy via a sequence of checkpoints encoded in both the state space and reward function.<n>To ensure social compliance, the local planner employs an entity-aware reward structure that dynamically adjusts safety margins and penalties based on the semantic type of surrounding agents.
arXiv Detail & Related papers (2025-12-31T05:58:57Z)
Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation [0.0]
This article investigates issues through the case of the BlueROV2, an open platform widely used for scientific experimentation.<n>We propose a deep reinforcement learning approach based on the Proximal Policy Optimization (PPO) algorithm.<n>Results show that the PPO policy consistently outperforms DWA in highly cluttered environments.
arXiv Detail & Related papers (2025-12-11T18:52:42Z)
Goal Discovery with Causal Capacity for Efficient Reinforcement Learning [85.28685202281918]
Causal inference is crucial for humans to explore the world.<n>We propose a novel Goal Discovery with Causal Capacity framework for efficient environment exploration.
arXiv Detail & Related papers (2025-08-13T08:54:56Z)
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents [17.86691411018085]
UAV-ON is a benchmark for large-scale Object Goal Navigation (NavObject) by aerial agents in open-world environments.<n>It comprises 14 high-fidelity Unreal Engine environments with diverse semantic regions and complex spatial layouts.<n>It defines 1270 annotated target objects, each characterized by an instance-level instruction that encodes category, physical footprint, and visual descriptors.
arXiv Detail & Related papers (2025-08-01T03:23:06Z)
NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
SemNav: A Model-Based Planner for Zero-Shot Object Goal Navigation Using Vision-Foundation Models [10.671262416557704]
Vision Foundation Models (VFMs) offer powerful capabilities for visual understanding and reasoning.<n>We present a zero-shot object goal navigation framework that integrates the perceptual strength of VFMs with a model-based planner.<n>We evaluate our approach on the HM3D dataset using the Habitat simulator and demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-06-04T03:04:54Z)
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions [64.69468932145234]
We present HA-VLN 2.0, a unified benchmark introducing explicit social-awareness constraints.<n>Results show that explicit social modeling improves navigation robustness and reduces collisions.
arXiv Detail & Related papers (2025-03-18T13:05:55Z)
Sonar-based Deep Learning in Underwater Robotics: Overview, Robustness and Challenges [0.46873264197900916]
The predominant use of sonar in underwater environments, characterized by limited training data and inherent noise, poses challenges to model robustness.<n>This paper studies sonar-based perception task models, such as classification, object detection, segmentation, and SLAM.<n>It systematizes sonar-based state-of-the-art datasets, simulators, and robustness methods such as neural network verification, out-of-distribution, and adversarial attacks.
arXiv Detail & Related papers (2024-12-16T15:03:08Z)
Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment. Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z)
Learning to Move with Affordance Maps [57.198806691838364]
The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent. Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry. We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.
arXiv Detail & Related papers (2020-01-08T04:05:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.