MMP-A*: Multimodal Perception Enhanced Incremental Heuristic Search on Path Planning
- URL: http://arxiv.org/abs/2601.01910v1
- Date: Mon, 05 Jan 2026 08:55:27 GMT
- Title: MMP-A*: Multimodal Perception Enhanced Incremental Heuristic Search on Path Planning
- Authors: Minh Hieu Ha, Khanh Ly Ta, Hung Phan, Tung Doan, Tung Dao, Dao Tran, Huynh Thi Thanh Binh,
- Abstract summary: MMP-A* is a multimodal framework that integrates the spatial grounding capabilities of vision-language models with a novel adaptive decay mechanism.<n>We show that MMP-A* achieves near-optimal trajectories with significantly reduced operational costs.
- Score: 8.522882937983972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous path planning requires a synergy between global reasoning and geometric precision, especially in complex or cluttered environments. While classical A* is valued for its optimality, it incurs prohibitive computational and memory costs in large-scale scenarios. Recent attempts to mitigate these limitations by using Large Language Models for waypoint guidance remain insufficient, as they rely only on text-based reasoning without spatial grounding. As a result, such models often produce incorrect waypoints in topologically complex environments with dead ends, and lack the perceptual capacity to interpret ambiguous physical boundaries. These inconsistencies lead to costly corrective expansions and undermine the intended computational efficiency. We introduce MMP-A*, a multimodal framework that integrates the spatial grounding capabilities of vision-language models with a novel adaptive decay mechanism. By anchoring high-level reasoning in physical geometry, the framework produces coherent waypoint guidance that addresses the limitations of text-only planners. The adaptive decay mechanism dynamically regulates the influence of uncertain waypoints within the heuristic, ensuring geometric validity while substantially reducing memory overhead. To evaluate robustness, we test the framework in challenging environments characterized by severe clutter and topological complexity. Experimental results show that MMP-A* achieves near-optimal trajectories with significantly reduced operational costs, demonstrating its potential as a perception-grounded and computationally efficient paradigm for autonomous navigation.
Related papers
- RAGNav: A Retrieval-Augmented Topological Reasoning Framework for Multi-Goal Visual-Language Navigation [1.7508558850131373]
Vision-Language Navigation (VLN) is evolving from single-point pathfinding toward the more challenging Multi-Goal VLN.<n>RAGNav is a framework that bridges the gap between semantic reasoning and physical structure.
arXiv Detail & Related papers (2026-03-04T05:31:33Z) - Learning Object-Centric Spatial Reasoning for Sequential Manipulation in Cluttered Environments [0.0]
We present Unveiler, a framework that separates high-level spatial reasoning from low-level action execution.<n>We demonstrate this decoupled architecture is more computationally efficient in terms of parameter count and inference time.<n>Our results, achieving up to 97.6% success in partially occluded and 90.0% in fully occluded scenarios in simulation, make a case for the power of specialized, object-centric reasoning in complex manipulation tasks.
arXiv Detail & Related papers (2026-03-03T01:45:53Z) - PhyG-MoE: A Physics-Guided Mixture-of-Experts Framework for Energy-Efficient GNSS Interference Recognition [49.955269674859004]
This paper introduces PhyG-MoE (Physics-Guided Mixture-of-Experts), a framework designed to align model capacity with signal complexity.<n>Unlike static architectures, the proposed system employs a spectrum-based gating mechanism that routes signals based on their spectral feature entanglement.<n>A high-capacity TransNeXt expert is activated on-demand to disentangle complex features in saturated scenarios, while lightweight experts handle fundamental signals to minimize latency.
arXiv Detail & Related papers (2026-01-19T07:57:52Z) - PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models [51.43746425777865]
Large Language Models (LLMs) often lack the capacity to formulate global strategies, leading to error propagation in long-horizon tasks.<n>We propose PILOT, a framework designed to internalize the strategic oversight of large models into intrinsic Latent Guidance.
arXiv Detail & Related papers (2026-01-07T12:38:56Z) - Discrete-Guided Diffusion for Scalable and Safe Multi-Robot Motion Planning [56.240199425429445]
Multi-Robot Motion Planning (MPMP) involves generating trajectories for multiple robots operating in a shared continuous workspace.<n>While discrete multi-agent finding (MAPF) methods are broadly adopted due to their scalability, their coarse discretization trajectory quality.<n>This paper tackles limitations of two approaches by introducing discrete MAPF solvers with constrained generative diffusion models.
arXiv Detail & Related papers (2025-08-27T17:59:36Z) - Feature-Space Planes Searcher: A Universal Domain Adaptation Framework for Interpretability and Computational Efficiency [7.889121135601528]
Current unsupervised domain adaptation methods rely on fine-tuning feature extractors.<n>We propose Feature-space Planes Searcher (FPS) as a novel domain adaptation framework.<n>We show that FPS achieves competitive or superior performance to state-of-the-art methods.
arXiv Detail & Related papers (2025-08-26T05:39:21Z) - A Pseudo Global Fusion Paradigm-Based Cross-View Network for LiDAR-Based Place Recognition [12.93382945887946]
LiDAR-based Place Recognition (LPR) remains a critical task in Embodied Artificial Intelligence (AI) and Autonomous Driving.<n>Existing approaches reduce place recognition to a Euclidean distance-based metric learning task, neglecting the feature space's intrinsic structures and intra-class variances.<n>We propose a novel cross-view network based on an innovative fusion paradigm to address these challenges.
arXiv Detail & Related papers (2025-08-12T13:12:48Z) - Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control [14.929720580977152]
This paper introduces a novel MARL framework that integrates Dynamic Graph Neural Networks (DGNNs) and Topological Data Analysis (TDA)<n>Inspired by the Mixture of Experts (MoE) architecture in Large Language Models (LLMs), a topology-assisted spatial pattern disentangling (TSD)-enhanced MoE is proposed.<n> Extensive experiments conducted on real-world traffic scenarios, together with comprehensive theoretical analysis, validate the superior performance of the proposed framework.
arXiv Detail & Related papers (2025-06-14T11:18:12Z) - Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems [49.819436680336786]
We propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems.<n>Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics.<n>Our ETGPSSM outperforms existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
arXiv Detail & Related papers (2025-03-24T03:19:45Z) - Localized Physics-informed Gaussian Processes with Curriculum Training for Topology Optimization [3.6352820455705372]
We introduce a simultaneous and meshfree topology optimization (TO) framework based on physics-informed Gaussian processes (GPs)<n>Our framework endows all design and state variables via GP priors which have a shared, multi-output mean function that is parametrized via a customized deep neural network (DNN)<n>Our TO approach yields well-defined material interfaces and has a built-in continuation nature that promotes global optimality.
arXiv Detail & Related papers (2025-03-18T22:59:16Z) - Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning [91.95362946266577]
Path planning is a fundamental scientific problem in robotics and autonomous navigation.<n>Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows.<n>We propose a new LLM based route planning method that synergistically combines the precise pathfinding capabilities of A* with the global reasoning capability of LLMs.<n>This hybrid approach aims to enhance pathfinding efficiency in terms of time and space complexity while maintaining the integrity of path validity, especially in large-scale scenarios.
arXiv Detail & Related papers (2024-06-20T01:24:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.