Deep Reinforcement Learning for Continuous Docking Control of Autonomous
Underwater Vehicles: A Benchmarking Study
- URL: http://arxiv.org/abs/2108.02665v1
- Date: Thu, 5 Aug 2021 14:58:05 GMT
- Title: Deep Reinforcement Learning for Continuous Docking Control of Autonomous
Underwater Vehicles: A Benchmarking Study
- Authors: Mihir Patil and Bilal Wehbe and Matias Valdenegro-Toro
- Abstract summary: This work explores the application of state-of-the-art model-free deep reinforcement learning approaches to the task of AUV docking in the continuous domain.
We provide a detailed formulation of the reward function, utilized to successfully dock the AUV onto a fixed docking platform.
- Score: 1.7403133838762446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Docking control of an autonomous underwater vehicle (AUV) is a task that is
integral to achieving persistent long term autonomy. This work explores the
application of state-of-the-art model-free deep reinforcement learning (DRL)
approaches to the task of AUV docking in the continuous domain. We provide a
detailed formulation of the reward function, utilized to successfully dock the
AUV onto a fixed docking platform. A major contribution that distinguishes our
work from the previous approaches is the usage of a physics simulator to define
and simulate the underwater environment as well as the DeepLeng AUV. We propose
a new reward function formulation for the docking task, incorporating several
components, that outperforms previous reward formulations. We evaluate proximal
policy optimization (PPO), twin delayed deep deterministic policy gradients
(TD3) and soft actor-critic (SAC) in combination with our reward function. Our
evaluation yielded results that conclusively show the TD3 agent to be most
efficient and consistent in terms of docking the AUV, over multiple evaluation
runs it achieved a 100% success rate and episode return of 10667.1 +- 688.8. We
also show how our reward function formulation improves over the state of the
art.
Related papers
- FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation [65.01601309903971]
We introduce FAFA, a Frequency-Aware Flow-Aided self-supervised framework for 6D pose estimation of unmanned underwater vehicles (UUVs)
Our framework relies solely on the 3D model and RGB images, alleviating the need for any real pose annotations or other-modality data like depths.
We evaluate the effectiveness of FAFA on common underwater object pose benchmarks and showcase significant performance improvements compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-09-25T03:54:01Z) - Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning [0.0]
insufficient exploration of the spatial space can result in suboptimal policies when controlling 7-DOF robotic arms.
We propose a novel Exploration-Enhanced Contrastive Learning (EECL) module that improves exploration by providing additional rewards for encountering novel states.
We evaluate our method on the robosuite panda lift task, demonstrating that it significantly outperforms the baseline TD3 in terms of both efficiency and convergence speed in the tested environment.
arXiv Detail & Related papers (2024-08-26T04:30:59Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - Enhancing AUV Autonomy With Model Predictive Path Integral Control [9.800697959791544]
We investigate the feasibility of Model Predictive Path Integral Control (MPPI) for the control of an AUV.
We utilise a non-linear model of the AUV to propagate the samples of the MPPI, which allow us to compute the control action in real time.
arXiv Detail & Related papers (2023-08-10T12:55:57Z) - CCE: Sample Efficient Sparse Reward Policy Learning for Robotic Navigation via Confidence-Controlled Exploration [72.24964965882783]
Confidence-Controlled Exploration (CCE) is designed to enhance the training sample efficiency of reinforcement learning algorithms for sparse reward settings such as robot navigation.
CCE is based on a novel relationship we provide between gradient estimation and policy entropy.
We demonstrate through simulated and real-world experiments that CCE outperforms conventional methods that employ constant trajectory lengths and entropy regularization.
arXiv Detail & Related papers (2023-06-09T18:45:15Z) - Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning [55.2080971216584]
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL)
We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
arXiv Detail & Related papers (2023-01-26T01:06:46Z) - Dealing with Sparse Rewards in Continuous Control Robotics via
Heavy-Tailed Policies [64.2210390071609]
We present a novel Heavy-Tailed Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.
We show consistent performance improvement across all tasks in terms of high average cumulative reward.
arXiv Detail & Related papers (2022-06-12T04:09:39Z) - Reinforcement learning reward function in unmanned aerial vehicle
control tasks [0.0]
The reward function is based on the construction and estimation of the time of simplified trajectories to the target.
The effectiveness of the reward function was tested in a newly developed virtual environment.
arXiv Detail & Related papers (2022-03-20T10:32:44Z) - Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation [78.17108227614928]
We propose a benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation.
We consider a value-based and policy-gradient Deep Reinforcement Learning (DRL)
We also propose a verification strategy that checks the behavior of the trained models over a set of desired properties.
arXiv Detail & Related papers (2021-12-16T16:53:56Z) - f-IRL: Inverse Reinforcement Learning via State Marginal Matching [13.100127636586317]
We propose a method for learning the reward function (and the corresponding policy) to match the expert state density.
We present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent.
Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories.
arXiv Detail & Related papers (2020-11-09T19:37:48Z) - Deep Inverse Q-learning with Constraints [15.582910645906145]
We introduce a novel class of algorithms that only needs to solve the MDP underlying the demonstrated behavior once to recover the expert policy.
We show how to extend this class of algorithms to continuous state-spaces via function approximation and how to estimate a corresponding action-value function.
We evaluate the resulting algorithms called Inverse Action-value Iteration, Inverse Q-learning and Deep Inverse Q-learning on the Objectworld benchmark.
arXiv Detail & Related papers (2020-08-04T17:21:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.