Related papers: Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios

Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios

URL: http://arxiv.org/abs/2509.15582v1
Date: Fri, 19 Sep 2025 04:33:39 GMT
Title: Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios
Authors: Yuting Zeng, Zhiwen Zheng, You Zhou, JiaLing Xiao, Yongbin Yu, Manping Fan, Bo Gong, Liyong Ren,
Abstract summary: This paper proposes a momentum-constrained hybrid trajectory optimization framework (MHHTOF) tailored for assistive navigation in visually impaired scenarios.<n>It integrates trajectory sampling generation, optimization and evaluation with residual deep reinforcement learning (DRL)<n> Experimental results demonstrate that the proposed LSTM-BResPPO achieves significantly faster convergence, attaining stable policy performance in approximately half the training required by the PPO.
Score: 4.735413508037063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a momentum-constrained hybrid heuristic trajectory optimization framework (MHHTOF) tailored for assistive navigation in visually impaired scenarios, integrating trajectory sampling generation, optimization and evaluation with residual-enhanced deep reinforcement learning (DRL). In the first stage, heuristic trajectory sampling cluster (HTSC) is generated in the Frenet coordinate system using third-order interpolation with fifth-order polynomials and momentum-constrained trajectory optimization (MTO) constraints to ensure smoothness and feasibility. After first stage cost evaluation, the second stage leverages a residual-enhanced actor-critic network with LSTM-based temporal feature modeling to adaptively refine trajectory selection in the Cartesian coordinate system. A dual-stage cost modeling mechanism (DCMM) with weight transfer aligns semantic priorities across stages, supporting human-centered optimization. Experimental results demonstrate that the proposed LSTM-ResB-PPO achieves significantly faster convergence, attaining stable policy performance in approximately half the training iterations required by the PPO baseline, while simultaneously enhancing both reward outcomes and training stability. Compared to baseline method, the selected model reduces average cost and cost variance by 30.3% and 53.3%, and lowers ego and obstacle risks by over 77%. These findings validate the framework's effectiveness in enhancing robustness, safety, and real-time feasibility in complex assistive planning tasks.

Related papers

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization [60.87651283510059]
Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs.<n>We propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation.<n>To mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy.
arXiv Detail & Related papers (2026-03-04T14:48:53Z)
Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration [56.074760766965085]
PRISM achieves a dynamics-aware framework that arbitrates data based on its degree of cognitive conflict with the model's existing knowledge.<n>Our findings suggest that disentangling data based on internal optimization regimes is crucial for scalable and robust agent alignment.
arXiv Detail & Related papers (2026-01-12T05:43:20Z)
TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making [75.29820290660065]
This paper proposes Thought-Centric Preference Optimization ( TCPO) for effective embodied decision-making.<n>It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation.<n>Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM.
arXiv Detail & Related papers (2025-09-10T11:16:21Z)
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment [24.296667264939515]
We propose GRAO (Group Relative Alignment Optimization), a unified framework that synergizes the strengths of SFT (supervised fine-tuning) and RL (reinforcement learning)<n>Our theoretical analysis establishes GRAO's convergence guarantees and sample efficiency advantages over conventional approaches.<n>This work provides both a theoretically grounded alignment framework and empirical evidence for efficient capability evolution in language models.
arXiv Detail & Related papers (2025-08-11T08:28:47Z)
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training [64.0932926819307]
We present Warmup-Stable and Merge (WSM), a framework that establishes a formal connection between learning rate decay and model merging.<n>WSM provides a unified theoretical foundation for emulating various decay strategies.<n>Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks.
arXiv Detail & Related papers (2025-07-23T16:02:06Z)
Enhancing Spatiotemporal Networks with xLSTM: A Scalar LSTM Approach for Cellular Traffic Forecasting [0.7111641404908191]
We introduce a lightweight, dual-path Spatiotemporal Network that leverages a gradientr LSTM for efficient modeling and a three-layer Conv3D module for spatial feature extraction.<n>We show superior forecast performance over ConvLSTM baselines and strong generalization to unseen regions, making it well-suited for large-scale next-generation network deployments.
arXiv Detail & Related papers (2025-07-17T22:48:46Z)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z)
Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs)<n>RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness.<n>RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z)
A RankNet-Inspired Surrogate-Assisted Hybrid Metaheuristic for Expensive Coverage Optimization [5.757318591302855]
We propose a RankNet-Inspired Surrogate-assisted Hybrid Metaheuristic to handle large-scale coverage optimization tasks.<n>Our algorithm consistently outperforms state-of-the-art algorithms for EMVOPs.
arXiv Detail & Related papers (2025-01-13T14:49:05Z)
Towards Robust Spacecraft Trajectory Optimization via Transformers [17.073280827888226]
We develop an autonomous generative model to solve non- optimal control problems in real-time.<n>We extend the capabilities of ART to address robust chance-constrained optimal control problems.<n>This work marks an initial step toward the reliable deployment of AI-driven solutions in safety-critical autonomous systems such as spacecraft.
arXiv Detail & Related papers (2024-10-08T00:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.