ACERAC: Efficient reinforcement learning in fine time discretization
- URL: http://arxiv.org/abs/2104.04004v1
- Date: Thu, 8 Apr 2021 18:40:20 GMT
- Title: ACERAC: Efficient reinforcement learning in fine time discretization
- Authors: Pawe{\l} Wawrzy\'nski, Jakub {\L}yskawa
- Abstract summary: We propose a framework for reinforcement learning (RL) in fine time discretization and a learning algorithm in this framework.
The efficiency of this algorithm is verified against three other RL methods in diverse time discretization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a framework for reinforcement learning (RL) in fine time
discretization and a learning algorithm in this framework. One of the main
goals of RL is to provide a way for physical machines to learn optimal behavior
instead of being programmed. However, the machines are usually controlled in
fine time discretization. The most common RL methods apply independent random
elements to each action, which is not suitable in that setting. It is not
feasible because it causes the controlled system to jerk, and does not ensure
sufficient exploration since a single action is not long enough to create a
significant experience that could be translated into policy improvement. In the
RL framework introduced in this paper, policies are considered that produce
actions based on states and random elements autocorrelated in subsequent time
instants. The RL algorithm introduced here approximately optimizes such a
policy. The efficiency of this algorithm is verified against three other RL
methods (PPO, SAC, ACER) in four simulated learning control problems (Ant,
HalfCheetah, Hopper, and Walker2D) in diverse time discretization. The
algorithm introduced here outperforms the competitors in most cases considered.
Related papers
- When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL [37.58940726230092]
Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP)
We formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge.
We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart.
arXiv Detail & Related papers (2024-06-03T09:57:18Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Actor-Critic with variable time discretization via sustained actions [0.0]
SusACER is an off-policyReinforcement learning algorithm that combines the advantages of different time discretization settings.
We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D.
arXiv Detail & Related papers (2023-08-08T14:45:00Z) - Decoupled Prioritized Resampling for Offline RL [120.49021589395005]
We propose Offline Prioritized Experience Replay (OPER) for offline reinforcement learning.
OPER features a class of priority functions designed to prioritize highly-rewarding transitions, making them more frequently visited during training.
We show that this class of priority functions induce an improved behavior policy, and when constrained to this improved policy, a policy-constrained offline RL algorithm is likely to yield a better solution.
arXiv Detail & Related papers (2023-06-08T17:56:46Z) - CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards
global optimality [5.0915256711576475]
This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trayy (TO) and Reinforcement Learning (RL) in a single trajectory.
arXiv Detail & Related papers (2022-11-12T10:16:35Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck.
We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z) - A framework for reinforcement learning with autocorrelated actions [0.0]
Policies are considered here that produce actions based on states and random elements in subsequent time instants.
An algorithm is introduced here that approximately optimize the aforementioned policy.
Its efficiency is verified for four simulated learning control problems.
arXiv Detail & Related papers (2020-09-10T11:23:09Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.