Dynamic Decision Frequency with Continuous Options
- URL: http://arxiv.org/abs/2212.04407v4
- Date: Wed, 25 Oct 2023 06:57:08 GMT
- Title: Dynamic Decision Frequency with Continuous Options
- Authors: Amirmohammad Karimi, Jun Jin, Jun Luo, A. Rupam Mahmood, Martin
Jagersand and Samuele Tosatto
- Abstract summary: In classic reinforcement learning algorithms, agents make decisions at discrete and fixed time intervals.
We propose a framework called Continuous-Time Continuous-Options (CTCO) where the agent chooses options as sub-policies of variable durations.
We show that our algorithm's performance is not affected by the choice of environment interaction frequency.
- Score: 11.83290684845269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In classic reinforcement learning algorithms, agents make decisions at
discrete and fixed time intervals. The duration between decisions becomes a
crucial hyperparameter, as setting it too short may increase the problem's
difficulty by requiring the agent to make numerous decisions to achieve its
goal while setting it too long can result in the agent losing control over the
system. However, physical systems do not necessarily require a constant control
frequency, and for learning agents, it is often preferable to operate with a
low frequency when possible and a high frequency when necessary. We propose a
framework called Continuous-Time Continuous-Options (CTCO), where the agent
chooses options as sub-policies of variable durations. These options are
time-continuous and can interact with the system at any desired frequency
providing a smooth change of actions. We demonstrate the effectiveness of CTCO
by comparing its performance to classical RL and temporal-abstraction RL
methods on simulated continuous control tasks with various action-cycle times.
We show that our algorithm's performance is not affected by the choice of
environment interaction frequency. Furthermore, we demonstrate the efficacy of
CTCO in facilitating exploration in a real-world visual reaching task for a 7
DOF robotic arm with sparse rewards.
Related papers
- MOSEAC: Streamlined Variable Time Step Reinforcement Learning [14.838483990647697]
We introduce the Multi-Objective Soft Elastic Actor-Critic (MOSEAC) method.
MOSEAC features an adaptive reward scheme based on observed trends in task rewards during training.
We validate the MOSEAC method through simulations in a Newtonian kinematics environment.
arXiv Detail & Related papers (2024-06-03T16:51:57Z) - When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL [37.58940726230092]
Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP)
We formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge.
We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart.
arXiv Detail & Related papers (2024-06-03T09:57:18Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Reinforcement Learning with Elastic Time Steps [14.838483990647697]
Multi-Objective Soft Elastic Actor-Critic (MOSEAC) is an off-policy actor-critic algorithm that uses elastic time steps to dynamically adjust the control frequency.
We show that MOSEAC converges and produces stable policies at the theoretical level, and validate our findings in a real-time 3D racing game.
arXiv Detail & Related papers (2024-02-22T20:49:04Z) - Deployable Reinforcement Learning with Variable Control Rate [14.838483990647697]
We propose a variant of Reinforcement Learning (RL) with variable control rate.
In this approach, the policy decides the action the agent should take as well as the duration of the time step associated with that action.
We show the efficacy of SEAC through a proof-of-concept simulation driving an agent with Newtonian kinematics.
arXiv Detail & Related papers (2024-01-17T15:40:11Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Actor-Critic with variable time discretization via sustained actions [0.0]
SusACER is an off-policyReinforcement learning algorithm that combines the advantages of different time discretization settings.
We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D.
arXiv Detail & Related papers (2023-08-08T14:45:00Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Neural optimal feedback control with local learning rules [67.5926699124528]
A major problem in motor control is understanding how the brain plans and executes proper movements in the face of delayed and noisy stimuli.
We introduce a novel online algorithm which combines adaptive Kalman filtering with a model free control approach.
arXiv Detail & Related papers (2021-11-12T20:02:00Z) - Deep Explicit Duration Switching Models for Time Series [84.33678003781908]
We propose a flexible model that is capable of identifying both state- and time-dependent switching dynamics.
State-dependent switching is enabled by a recurrent state-to-switch connection.
An explicit duration count variable is used to improve the time-dependent switching behavior.
arXiv Detail & Related papers (2021-10-26T17:35:21Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.