Actor-Critic with variable time discretization via sustained actions
- URL: http://arxiv.org/abs/2308.04299v1
- Date: Tue, 8 Aug 2023 14:45:00 GMT
- Title: Actor-Critic with variable time discretization via sustained actions
- Authors: Jakub {\L}yskawa, Pawe{\l} Wawrzy\'nski
- Abstract summary: SusACER is an off-policyReinforcement learning algorithm that combines the advantages of different time discretization settings.
We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) methods work in discrete time. In order to apply
RL to inherently continuous problems like robotic control, a specific time
discretization needs to be defined. This is a choice between sparse time
control, which may be easier to train, and finer time control, which may allow
for better ultimate performance. In this work, we propose SusACER, an
off-policy RL algorithm that combines the advantages of different time
discretization settings. Initially, it operates with sparse time discretization
and gradually switches to a fine one. We analyze the effects of the changing
time discretization in robotic control environments: Ant, HalfCheetah, Hopper,
and Walker2D. In all cases our proposed algorithm outperforms state of the art.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL [37.58940726230092]
Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP)
We formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge.
We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart.
arXiv Detail & Related papers (2024-06-03T09:57:18Z) - Reinforcement Learning with Elastic Time Steps [14.838483990647697]
Multi-Objective Soft Elastic Actor-Critic (MOSEAC) is an off-policy actor-critic algorithm that uses elastic time steps to dynamically adjust the control frequency.
We show that MOSEAC converges and produces stable policies at the theoretical level, and validate our findings in a real-time 3D racing game.
arXiv Detail & Related papers (2024-02-22T20:49:04Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Reaching the Limit in Autonomous Racing: Optimal Control versus
Reinforcement Learning [66.10854214036605]
A central question in robotics is how to design a control system for an agile mobile robot.
We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting.
Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour.
arXiv Detail & Related papers (2023-10-17T02:40:27Z) - Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations [98.5802673062712]
We introduce temporally-coupled perturbations, presenting a novel challenge for existing robust reinforcement learning methods.
We propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
arXiv Detail & Related papers (2023-07-22T12:10:04Z) - Dynamic Decision Frequency with Continuous Options [11.83290684845269]
In classic reinforcement learning algorithms, agents make decisions at discrete and fixed time intervals.
We propose a framework called Continuous-Time Continuous-Options (CTCO) where the agent chooses options as sub-policies of variable durations.
We show that our algorithm's performance is not affected by the choice of environment interaction frequency.
arXiv Detail & Related papers (2022-12-06T19:51:12Z) - ACERAC: Efficient reinforcement learning in fine time discretization [0.0]
We propose a framework for reinforcement learning (RL) in fine time discretization and a learning algorithm in this framework.
The efficiency of this algorithm is verified against three other RL methods in diverse time discretization.
arXiv Detail & Related papers (2021-04-08T18:40:20Z) - Online Reinforcement Learning Control by Direct Heuristic Dynamic
Programming: from Time-Driven to Event-Driven [80.94390916562179]
Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives.
It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise.
We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
arXiv Detail & Related papers (2020-06-16T05:51:25Z) - Time Adaptive Reinforcement Learning [2.0305676256390934]
Reinforcement learning (RL) allows to solve complex tasks such as Go often with a stronger performance than humans.
Here we consider the case of adapting RL agents to different time restrictions, such as finishing a task with a given time limit that might change from one task execution to the next.
We introduce two model-free, value-based algorithms: the Independent Gamma-Ensemble and the n-Step Ensemble.
arXiv Detail & Related papers (2020-04-18T11:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.