Related papers: Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

URL: http://arxiv.org/abs/2309.09408v2
Date: Thu, 12 Oct 2023 23:55:38 GMT
Title: Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration
Authors: Jinning Li, Xinyi Liu, Banghua Zhu, Jiantao Jiao, Masayoshi Tomizuka, Chen Tang, Wei Zhan
Abstract summary: We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue. We propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework. GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.
Score: 75.51109230296568
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Safe Reinforcement Learning (RL) aims to find a policy that achieves high rewards while satisfying cost constraints. When learning from scratch, safe RL agents tend to be overly conservative, which impedes exploration and restrains the overall performance. In many realistic tasks, e.g. autonomous driving, large-scale expert demonstration data are available. We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue. Large-capacity models, e.g. decision transformers (DT), have been proven to be competent in offline policy learning. However, data collected in real-world scenarios rarely contain dangerous cases (e.g., collisions), which makes it prohibitive for the policies to learn safety concepts. Besides, these bulk policy networks cannot meet the computation speed requirements at inference time on real-world tasks such as autonomous driving. To this end, we propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework. GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms. Experiments in both benchmark safe RL tasks and real-world driving tasks based on the Waymo Open Motion Dataset (WOMD) demonstrate that GOLD can successfully distill lightweight policies and solve decision-making problems in challenging safety-critical scenarios.

Related papers

Reward-Safety Balance in Offline Safe RL via Diffusion Regularization [16.5825143820431]
Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL) DRCORL first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference.
arXiv Detail & Related papers (2025-02-18T00:00:03Z)
Safe Reinforcement Learning with Minimal Supervision [45.44831696628473]
Reinforcement learning (RL) in the real world requires procedures that enable agents to explore without causing harm to themselves or others. The most successful solutions to the problem of safe RL leverage offline data to learn a safe-set, enabling safe online exploration. This paper investigates the influence of the quantity and quality of data used to train the initial safe learning problem offline on the ability to learn safe-RL policies online.
arXiv Detail & Related papers (2025-01-08T13:04:08Z)
FOSP: Fine-tuning Offline Safe Policy through World Models [3.7971075341023526]
Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. However, prior works still pose safety challenges due to the online exploration in real-world deployment. In this paper, we aim to further enhance safety during the deployment stage for vision-based robotic tasks by fine-tuning an offline-trained policy.
arXiv Detail & Related papers (2024-07-06T03:22:57Z)
DRNet: A Decision-Making Method for Autonomous Lane Changingwith Deep Reinforcement Learning [7.2282857478457805]
"DRNet" is a novel DRL-based framework that enables a DRL agent to learn to drive by executing reasonable lane changing on simulated highways. Our DRL agent has the ability to learn the desired task without causing collisions and outperforms DDQN and other baseline models.
arXiv Detail & Related papers (2023-11-02T21:17:52Z)
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias [96.14064037614942]
offline retraining, a policy extraction step at the end of online fine-tuning, is proposed. An optimistic (exploration) policy is used to interact with the environment, and a separate pessimistic (exploitation) policy is trained on all the observed data for evaluation.
arXiv Detail & Related papers (2023-10-12T17:50:09Z)
Constrained Decision Transformer for Offline Safe Reinforcement Learning [16.485325576173427]
We study the offline safe RL problem from a novel multi-objective optimization perspective. We propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment.
arXiv Detail & Related papers (2023-02-14T21:27:10Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Safe Reinforcement Learning using Data-Driven Predictive Control [0.5459797813771499]
We propose a data-driven safety layer that acts as a filter for unsafe actions. The safety layer penalizes the RL agent if the proposed action is unsafe and replaces it with the closest safe one. In a simulation, we show that our method outperforms state-of-the-art safe RL methods on the robotics navigation problem.
arXiv Detail & Related papers (2022-11-20T17:10:40Z)
Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability. Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z)
SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints. Through principled training on an offline dataset, SAFER learns to extract safe primitive skills. In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z)
Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones [81.49106778460238]
Recovery RL uses offline data to learn about constraint violating zones before policy learning. We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task. Results suggest that Recovery RL trades off constraint violations and task successes 2 - 20 times more efficiently in simulation domains and 3 times more efficiently in physical experiments.
arXiv Detail & Related papers (2020-10-29T20:10:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.