Related papers: Reinforcement Learning from Demonstrations by Novel Interactive Expert and Application to Automatic Berthing Control Systems for Unmanned Surface Vessel

Reinforcement Learning from Demonstrations by Novel Interactive Expert and Application to Automatic Berthing Control Systems for Unmanned Surface Vessel

URL: http://arxiv.org/abs/2202.11325v1
Date: Wed, 23 Feb 2022 06:45:59 GMT
Title: Reinforcement Learning from Demonstrations by Novel Interactive Expert and Application to Automatic Berthing Control Systems for Unmanned Surface Vessel
Authors: Haoran Zhang, Chenkun Yin, Yanxin Zhang, Shangtai Jin, Zhenxuan Li
Abstract summary: Two novel practical methods of Reinforcement Learning from Demonstration (RLfD) are developed and applied to automatic berthing control systems for Unmanned Surface Vessel. A new expert data generation method, called Model Predictive Based Expert (MPBE), is developed to provide high quality supervision data for RLfD algorithms. Another novel RLfD algorithm based on the MP-DDPG, called Self-Guided Actor-Critic (SGAC) is present, which can effectively leverage MPBE by continuously querying it to generate high quality expert data online.
Score: 12.453219390225428
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, two novel practical methods of Reinforcement Learning from Demonstration (RLfD) are developed and applied to automatic berthing control systems for Unmanned Surface Vessel. A new expert data generation method, called Model Predictive Based Expert (MPBE) which combines Model Predictive Control and Deep Deterministic Policy Gradient, is developed to provide high quality supervision data for RLfD algorithms. A straightforward RLfD method, model predictive Deep Deterministic Policy Gradient (MP-DDPG), is firstly introduced by replacing the RL agent with MPBE to directly interact with the environment. Then distribution mismatch problem is analyzed for MP-DDPG, and two techniques that alleviate distribution mismatch are proposed. Furthermore, another novel RLfD algorithm based on the MP-DDPG, called Self-Guided Actor-Critic (SGAC) is present, which can effectively leverage MPBE by continuously querying it to generate high quality expert data online. The distribution mismatch problem leading to unstable learning process is addressed by SGAC in a DAgger manner. In addition, theoretical analysis is given to prove that SGAC algorithm can converge with guaranteed monotonic improvement. Simulation results verify the effectiveness of MP-DDPG and SGAC to accomplish the ship berthing control task, and show advantages of SGAC comparing with other typical reinforcement learning algorithms and MP-DDPG.

Related papers

A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance [3.4354636842203026]
We propose PGDA-RL, a primal-dual Projected Gradient Descent-Ascent algorithm for solving regularized Markov Decision Processes (MDPs)<n>PGDA-RL integrates experience replay-based gradient estimation with a two-timescale decomposition of the underlying nested optimization problem.<n>We prove that PGDA-RL converges almost surely to the optimal value function and policy of the regularized MDP.
arXiv Detail & Related papers (2025-05-07T15:18:43Z)
Augmented Lagrangian-Based Safe Reinforcement Learning Approach for Distribution System Volt/VAR Control [1.1059341532498634]
This paper formulates the Volt- VAR control problem as a constrained Markov decision process (CMDP) A novel safe off-policy reinforcement learning (RL) approach is proposed in this paper to solve the CMDP. A two-stage strategy is adopted for offline training and online execution, so the accurate distribution system model is no longer needed.
arXiv Detail & Related papers (2024-10-19T19:45:09Z)
Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models [14.790031018404942]
This paper presents an adaptive online learning framework for systems with uncertain parameters. We first integrate a forgetting factor to refine a variational sparse GP algorithm. In the second phase, we propose a safety filter based on high-order control barrier functions.
arXiv Detail & Related papers (2024-02-29T08:25:32Z)
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making. We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z)
Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies [64.2210390071609]
We present a novel Heavy-Tailed Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. We show consistent performance improvement across all tasks in terms of high average cumulative reward.
arXiv Detail & Related papers (2022-06-12T04:09:39Z)
Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems. It exploits the combination of reinforcement learning and latent variable generative models. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z)
PGTRNet: Two-phase Weakly Supervised Object Detection with Pseudo Ground Truth Refining [10.262660606897974]
Weakly Supervised Object Detection (WSOD) aiming to train detectors with only image-level annotations has arisen increasing attention. Current state-of-the-art approaches mainly follow a two-stage training strategy whichintegrates a fully supervised detector (FSD) with a pure WSOD model. There are two main problems hindering the performance of the two-phase WSOD approaches, i.e., insufficient learning problem and strict reliance between the FSD and the pseudo ground truth generated by theWSOD model. This paper proposes pseudo ground truth refinement network (PGTRNet), a simple yet effective method
arXiv Detail & Related papers (2021-08-25T19:20:49Z)
Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model [20.91419349793292]
We propose the first DRL-based navigation method modeled by a semi-Markov decision process (SMDP) with continuous action space, named Adaptive Forward Time Simulation (AFST) to overcome this problem.
arXiv Detail & Related papers (2021-08-13T10:30:25Z)
Mixed Policy Gradient: off-policy reinforcement learning driven jointly by data and model [32.61834127169759]
Reinforcement learning (RL) shows great potential in sequential decision-making. Mainstream RL algorithms are data-driven, which usually yield better performance but much slower convergence compared with model-driven methods. This paper proposes mixed policy gradient (MPG) algorithm, which fuses the empirical data and the transition model in policy gradient (PG) to accelerate convergence without performance.
arXiv Detail & Related papers (2021-02-23T06:05:17Z)
Covert Model Poisoning Against Federated Learning: Algorithm Design and Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions. In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms. Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z)
Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver. We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming. We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z)
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.