Related papers: Integrating Offline Pre-Training with Online Fine-Tuning: A Reinforcement Learning Approach for Robot Social Navigation

Integrating Offline Pre-Training with Online Fine-Tuning: A Reinforcement Learning Approach for Robot Social Navigation

URL: http://arxiv.org/abs/2510.00466v1
Date: Wed, 01 Oct 2025 03:37:02 GMT
Title: Integrating Offline Pre-Training with Online Fine-Tuning: A Reinforcement Learning Approach for Robot Social Navigation
Authors: Run Su, Hao Fu, Shuai Zhou, Yingao Fu,
Abstract summary: This paper proposes a novel offline-to-online fine-tuning algorithm for robot social navigation by integrating Return-to-Go (RTG)<n>Our algorithm features a Transformer-poral fusion model designed to precisely estimate RTG values in real-time by jointly encoding temporal pedestrian motion patterns and spatial crowd dynamics.<n>Experiments in simulated social navigation environments demonstrate that our method achieves a higher success rate and lower collision rate compared to state-of-the-art baselines.
Score: 3.5801655940143413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline reinforcement learning (RL) has emerged as a promising framework for addressing robot social navigation challenges. However, inherent uncertainties in pedestrian behavior and limited environmental interaction during training often lead to suboptimal exploration and distributional shifts between offline training and online deployment. To overcome these limitations, this paper proposes a novel offline-to-online fine-tuning RL algorithm for robot social navigation by integrating Return-to-Go (RTG) prediction into a causal Transformer architecture. Our algorithm features a spatiotem-poral fusion model designed to precisely estimate RTG values in real-time by jointly encoding temporal pedestrian motion patterns and spatial crowd dynamics. This RTG prediction framework mitigates distribution shift by aligning offline policy training with online environmental interactions. Furthermore, a hybrid offline-online experience sampling mechanism is built to stabilize policy updates during fine-tuning, ensuring balanced integration of pre-trained knowledge and real-time adaptation. Extensive experiments in simulated social navigation environments demonstrate that our method achieves a higher success rate and lower collision rate compared to state-of-the-art baselines. These results underscore the efficacy of our algorithm in enhancing navigation policy robustness and adaptability. This work paves the way for more reliable and adaptive robotic navigation systems in real-world applications.

Related papers

GIANT - Global Path Integration and Attentive Graph Networks for Multi-Agent Trajectory Planning [4.019914376054815]
This paper presents a novel approach to multi-robot collision avoidance that integrates global path planning with local navigation strategies.<n>We introduce a local navigation model that leverages pre-planned global paths, allowing robots to adhere to optimal routes while dynamically adjusting to environmental changes.<n>Our approach is evaluated against established baselines, including NH-ORCA, DRL-NAV, and GA3C-CADRL, across various structurally diverse simulated scenarios.
arXiv Detail & Related papers (2026-03-04T22:45:53Z)
Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation [0.0]
Hybrid Motion Planning with Deep Reinforcement Learning (HMP-DRL)<n>We propose a graph-based global planner to generate a path, which is integrated into a local DRL policy via a sequence of checkpoints encoded in both the state space and reward function.<n>To ensure social compliance, the local planner employs an entity-aware reward structure that dynamically adjusts safety margins and penalties based on the semantic type of surrounding agents.
arXiv Detail & Related papers (2025-12-31T05:58:57Z)
Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration [7.50564221243905]
We propose a novel coordinated-exploration multi-robot RL algorithm.<n>Its core component is a self-learning intrinsic reward mechanism designed to collectively alleviate policy conservatism.<n> Empirical results on social formation navigation benchmarks demonstrate the proposed algorithm's superior performance.
arXiv Detail & Related papers (2025-12-15T13:03:08Z)
Socially aware navigation for mobile robots: a survey on deep reinforcement learning approaches [1.2891210250935148]
Socially aware navigation is a fast-evolving research area in robotics that enables robots to move within human environments while adhering to implicit human social norms.<n>Deep Reinforcement Learning (DRL) has accelerated the development of navigation policies that enable robots to incorporate these social conventions while effectively reaching their objectives.<n>This survey offers a comprehensive overview of DRL-based approaches to socially aware navigation, highlighting key aspects such as proxemics, human comfort, naturalness, trajectory and intention prediction.
arXiv Detail & Related papers (2025-11-18T05:33:28Z)
Human-in-the-loop Online Rejection Sampling for Robotic Manipulation [55.99788088622936]
Hi-ORS stabilizes value estimation by filtering out negatively rewarded samples during online fine-tuning.<n>Hi-ORS fine-tunes a pi-base policy to master contact-rich manipulation in just 1.5 hours of real-world training.
arXiv Detail & Related papers (2025-10-30T11:53:08Z)
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning [78.86567400365392]
We present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories.<n>To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation.<n>Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks.
arXiv Detail & Related papers (2025-09-15T03:24:08Z)
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator [50.191655141020505]
Reinforcement Learning (RL) has demonstrated impressive capabilities in robotic control but remains challenging due to high sample complexity, safety concerns, and the sim-to-real gap.<n>We introduce Offline Robotic World Model (RWM-O), a model-based approach that explicitly estimates uncertainty to improve policy learning without reliance on a physics simulator.
arXiv Detail & Related papers (2025-04-23T12:58:15Z)
From Imitation to Exploration: End-to-end Autonomous Driving based on World Model [24.578178308010912]
RAMBLE is an end-to-end world model-based RL method for driving decision-making.<n>It can handle complex and dynamic traffic scenarios.<n>It achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0.
arXiv Detail & Related papers (2024-10-03T06:45:59Z)
SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning [26.554847852013737]
SoNIC is the first algorithm that integrates adaptive conformal inference and constrained reinforcement learning.<n>Our method achieves a success rate of 96.93%, which is 11.67% higher than the previous state-of-the-art RL method.<n>Our experiments demonstrate that the system can generate robust and socially polite decision-making when interacting with both sparse and dense crowds.
arXiv Detail & Related papers (2024-07-24T17:57:21Z)
Research on Autonomous Robots Navigation based on Reinforcement Learning [13.559881645869632]
We use the Deep Q Network (DQN) and Proximal Policy Optimization (PPO) models to optimize the path planning and decision-making process. We have verified the effectiveness and robustness of these models in various complex scenarios.
arXiv Detail & Related papers (2024-07-02T00:44:06Z)
Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [50.01551945190676]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning. We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures. We demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation.
arXiv Detail & Related papers (2024-01-22T18:58:22Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.