A framework for reinforcement learning with autocorrelated actions
- URL: http://arxiv.org/abs/2009.04777v1
- Date: Thu, 10 Sep 2020 11:23:09 GMT
- Title: A framework for reinforcement learning with autocorrelated actions
- Authors: Marcin Szulc, Jakub {\L}yskawa, Pawe{\l} Wawrzy\'nski
- Abstract summary: Policies are considered here that produce actions based on states and random elements in subsequent time instants.
An algorithm is introduced here that approximately optimize the aforementioned policy.
Its efficiency is verified for four simulated learning control problems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The subject of this paper is reinforcement learning. Policies are considered
here that produce actions based on states and random elements autocorrelated in
subsequent time instants. Consequently, an agent learns from experiments that
are distributed over time and potentially give better clues to policy
improvement. Also, physical implementation of such policies, e.g. in robotics,
is less problematic, as it avoids making robots shake. This is in opposition to
most RL algorithms which add white noise to control causing unwanted shaking of
the robots. An algorithm is introduced here that approximately optimizes the
aforementioned policy. Its efficiency is verified for four simulated learning
control problems (Ant, HalfCheetah, Hopper, and Walker2D) against three other
methods (PPO, SAC, ACER). The algorithm outperforms others in three of these
problems.
Related papers
- Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation.
We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - Off-Policy Deep Reinforcement Learning Algorithms for Handling Various
Robotic Manipulator Tasks [0.0]
In this study, three reinforcement learning algorithms; DDPG, TD3 and SAC have been used to train Fetch robotic manipulator for four different tasks.
All of these algorithms are off-policy and able to achieve their desired target by optimizing both policy and value functions.
arXiv Detail & Related papers (2022-12-11T18:25:24Z) - Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection.
Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior.
Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z) - Teaching a Robot to Walk Using Reinforcement Learning [0.0]
reinforcement learning can train optimal walking policies with ease.
We teach a simulated two-dimensional bipedal robot how to walk using the OpenAI Gym BipedalWalker-v3 environment.
ARS resulted in a better trained robot, and produced an optimal policy which officially "solves" the BipedalWalker-v3 problem.
arXiv Detail & Related papers (2021-12-13T21:35:45Z) - AWD3: Dynamic Reduction of the Estimation Bias [0.0]
We introduce a technique that eliminates the estimation bias in off-policy continuous control algorithms using the experience replay mechanism.
We show through continuous control environments of OpenAI gym that our algorithm matches or outperforms the state-of-the-art off-policy policy gradient learning algorithms.
arXiv Detail & Related papers (2021-11-12T15:46:19Z) - Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z) - Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck.
We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z) - Prioritized SIPP for Multi-Agent Path Finding With Kinematic Constraints [0.0]
Multi-Agent Path Finding (MAPF) is a long-standing problem in Robotics and Artificial Intelligence.
We present a method that mitigates this issue to a certain extent.
arXiv Detail & Related papers (2021-08-11T10:42:11Z) - ACERAC: Efficient reinforcement learning in fine time discretization [0.0]
We propose a framework for reinforcement learning (RL) in fine time discretization and a learning algorithm in this framework.
The efficiency of this algorithm is verified against three other RL methods in diverse time discretization.
arXiv Detail & Related papers (2021-04-08T18:40:20Z) - DDPG++: Striving for Simplicity in Continuous-control Off-Policy
Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled.
Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step.
Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.