Related papers: LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

URL: http://arxiv.org/abs/2512.19629v2
Date: Tue, 23 Dec 2025 05:37:16 GMT
Title: LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
Authors: Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, Tai Wang, Yuan Shen, Jiangmiao Pang,
Abstract summary: Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots.<n>We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework.<n>We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error.
Score: 41.054069737969876
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation. We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the https://steinate.github.io/logoplanner.github.io.

Related papers

Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation [14.466092698477858]
Current Visual-Language Navigation (VLN) methodologies face a trade-off between semantic understanding and control precision.<n>We propose Fly0, a framework that decouples semantic reasoning from geometric planning.<n>Fly0 reduces computational overhead and improves system stability.
arXiv Detail & Related papers (2026-02-02T09:06:50Z)
OpenNavMap: Structure-Free Topometric Mapping via Large-Scale Collaborative Localization [12.686154192361913]
OpenNavMap is a lightweight, structure-free topometric system leveraging 3D geometric foundation models for on-demand reconstruction.<n>Our method unifies dynamic programming-based sequence matching, geometric verification, and confidence-calibrated optimization to robust, coarse-to-fine submap alignment.<n> Evaluations on the Map-Free benchmark demonstrate superior accuracy over structure-from-motion and regression baselines, achieving an average translation error of 0.62m.
arXiv Detail & Related papers (2026-01-18T07:24:46Z)
TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction [57.46712611558817]
3D vision foundation models have shown strong generalization in reconstructing key 3D attributes from uncalibrated images through a single feed-forward pass.<n>Recent strategies align consecutive predictions by solving global transformation, yet our analysis reveals their fundamental limitations in assumption validity, local alignment scope, and robustness under noisy geometry.<n>We propose a higher-DOF and long-term alignment framework based on Thin Plate Spline, leveraging globally propagated control points to correct spatially varying inconsistencies.
arXiv Detail & Related papers (2025-12-02T02:22:20Z)
MetricNet: Recovering Metric Scale in Generative Navigation Policies [51.90872764552077]
MetricNet is an effective add-on for generative navigation that predicts the metric distance between waypoints.<n>We show that executing MetricNet-scaled waypoints significantly improves both navigation and exploration performance.<n>We also propose MetricNav, which integrates MetricNet into a navigation policy to guide the robot away from obstacles while still moving towards the goal.
arXiv Detail & Related papers (2025-09-17T13:37:13Z)
TANGO: Traversability-Aware Navigation with Local Metric Control for Topological Goals [10.69725316052444]
We present a novel RGB-only, object-level topometric navigation pipeline that enables zero-shot, long-horizon robot navigation.<n>Our approach integrates global topological path planning with local metric trajectory control, allowing the robot to navigate towards object-level sub-goals while avoiding obstacles.<n>We demonstrate the effectiveness of our method in both simulated environments and real-world tests, highlighting its robustness and deployability.
arXiv Detail & Related papers (2025-09-10T15:43:32Z)
Unified Linear Parametric Map Modeling and Perception-aware Trajectory Planning for Mobile Robotics [1.7495208770207367]
We introduce a lightweight linear parametric map by first mapping data to a high-dimensional space, followed by a sparse random projection for dimensionality reduction.<n>For UAVs, our method grid and Euclidean Signed Distance Field (ESDF) maps.<n>For UGVs, the model characterizes terrain and provides closed-form gradients, enabling online planning to circumvent large holes.
arXiv Detail & Related papers (2025-07-12T16:39:19Z)
NavTopo: Leveraging Topological Maps For Autonomous Navigation Of a Mobile Robot [1.0550841723235613]
We propose a full navigation pipeline based on topological map and two-level path planning. The pipeline localizes in the graph by matching neural network descriptors and 2D projections of the input point clouds. We test our approach in a large indoor photo-relaistic simulated environment and compare it to a metric map-based approach based on popular metric mapping method RTAB-MAP.
arXiv Detail & Related papers (2024-10-15T10:54:49Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? [93.10694819127608]
We propose a unified evaluation pipeline for forecasting methods with real-world perception inputs. Our in-depth study uncovers a substantial performance gap when transitioning from curated to perception-based data.
arXiv Detail & Related papers (2023-06-15T17:03:14Z)
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation [87.03299519917019]
We propose a dual-scale graph transformer (DUET) for joint long-term action planning and fine-grained cross-modal understanding. We build a topological map on-the-fly to enable efficient exploration in global action space. The proposed approach, DUET, significantly outperforms state-of-the-art methods on goal-oriented vision-and-language navigation benchmarks.
arXiv Detail & Related papers (2022-02-23T19:06:53Z)
Lightweight Object-level Topological Semantic Mapping and Long-term Global Localization based on Graph Matching [19.706907816202946]
We present a novel lightweight object-level mapping and localization method with high accuracy and robustness. We use object-level features with both semantic and geometric information to model landmarks in the environment. Based on the proposed map, the robust localization is achieved by constructing a novel local semantic scene graph descriptor.
arXiv Detail & Related papers (2022-01-16T05:47:07Z)
Differentiable Spatial Planning using Transformers [87.90709874369192]
We propose Spatial Planning Transformers (SPT), which given an obstacle map learns to generate actions by planning over long-range spatial dependencies. In the setting where the ground truth map is not known to the agent, we leverage pre-trained SPTs in an end-to-end framework. SPTs outperform prior state-of-the-art differentiable planners across all the setups for both manipulation and navigation tasks.
arXiv Detail & Related papers (2021-12-02T06:48:16Z)
Sign-Agnostic CONet: Learning Implicit Surface Reconstructions by Sign-Agnostic Optimization of Convolutional Occupancy Networks [39.65056638604885]
We learn implicit surface reconstruction by sign-agnostic optimization of convolutional occupancy networks. We show this goal can be effectively achieved by a simple yet effective design.
arXiv Detail & Related papers (2021-05-08T03:35:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.