FuguReport

Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

Authors Lingfeng Zhang, Xiaoshuai Hao, Xizhou Bu, Yingbo Tang, Hongsheng Li, Jinghui Lu, Xiu-shen Wei, Jiayi Ma, Yu Liu, Jing Zhang, Hangjun Ye, Xiaojun Liang, Long Chen, Wenbo Ding
Affiliations Xiaomi EV / Tsinghua University / Wuhan University / PengCheng Laboratory / Southeast University / Fudan University / Chinese Academy of Sciences / Hefei University of Technology
Categories Method / Social Navigation / Long-horizon mapless navigation framework, Task / Human-Robot Interaction / Human-centric outdoor navigation assistance, Evaluation / Safety-Aware Planning / Safety-conscious reasoning in navigation
License CC BY 4.0

Abstract Overview

This paper presents Walk with Me, a hierarchical framework for long-horizon outdoor social navigation from high-level human instructions without relying on a pre-built HD map. The system uses public map-service priors—GPS context, candidate points of interest, and walking-route APIs—to ground abstract user intent into a concrete destination and a coarse waypoint sequence. During execution, a High-Level Vision-Language Model jointly assesses whether the current situation is routine or safety-critical and decides whether the robot should proceed or stop and wait, while a Low-Level Vision-Language-Action policy generates local socially compliant trajectories for proceed steps. The method is instantiated on an Athena 2.0 Pro AGV wheeled robot and evaluated in real-world outdoor assistance settings including last-mile delivery and blind guidance across 20 trials.

Novelty

The main contribution is a map-free outdoor social navigation framework that integrates natural-language intent grounding via public map-service POIs, long-horizon waypoint construction, and an observation-aware routing mechanism that adaptively switches between low-level VLA control and explicit high-level VLM safety reasoning with stop-and-wait behavior. The paper also unifies destination grounding, coarse route planning, and socially aware execution under a single closed-loop hierarchy for human-centric outdoor assistance.

Results

In 20 real-world trials, the full system completed 12, yielding an overall success rate of 60%. Last-mile delivery achieved 70% success over 10 trials while blind guidance achieved 50% over 10 trials, with the latter being harder due to more open-ended intent grounding and conservative behavior in socially sensitive scenes. Ablation studies on the two delivery scenarios show that replacing the Low-Level VLA (e.g., GNM at 20% vs. SocialNav at 60%) and the High-Level VLM (e.g., Qwen3-VL-8B at 30% vs. MiMo-Embodied at 60%) both materially affect end-to-end success.

Key Points

  1. Walk with Me grounds abstract human instructions into concrete outdoor destinations using GPS context, POI candidates, and walking-route APIs from public map services, eliminating the need for a pre-built HD map.
  2. The framework employs an observation-aware routing mechanism where a High-Level VLM jointly assesses scene complexity and safety at each control step, dispatching routine segments to a Low-Level VLA for socially compliant trajectory generation and triggering stop-and-wait behavior when conditions are unsafe.
  3. Real-world experiments across 20 trials on delivery and blind-guidance scenarios demonstrate kilometer-scale outdoor execution with 60% overall success, and ablations on delivery tasks show clear performance differences across VLM and VLA backbone choices, with socially aware policies (SocialNav) and navigation-oriented VLMs (MiMo-Embodied) achieving the highest success rates.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.