Computationally Efficient Reinforcement Learning: Targeted Exploration
leveraging Simple Rules
- URL: http://arxiv.org/abs/2211.16691v3
- Date: Tue, 12 Sep 2023 09:39:42 GMT
- Title: Computationally Efficient Reinforcement Learning: Targeted Exploration
leveraging Simple Rules
- Authors: Loris Di Natale, Bratislav Svetozarevic, Philipp Heer, and Colin N.
Jones
- Abstract summary: We propose a simple yet effective modification of continuous actor-critic frameworks to incorporate such rules.
On a room temperature control case study, it allows agents to converge to well-performing policies up to 6-7x faster than classical agents.
- Score: 1.124958340749622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-free Reinforcement Learning (RL) generally suffers from poor sample
complexity, mostly due to the need to exhaustively explore the state-action
space to find well-performing policies. On the other hand, we postulate that
expert knowledge of the system often allows us to design simple rules we expect
good policies to follow at all times. In this work, we hence propose a simple
yet effective modification of continuous actor-critic frameworks to incorporate
such rules and avoid regions of the state-action space that are known to be
suboptimal, thereby significantly accelerating the convergence of RL agents.
Concretely, we saturate the actions chosen by the agent if they do not comply
with our intuition and, critically, modify the gradient update step of the
policy to ensure the learning process is not affected by the saturation step.
On a room temperature control case study, it allows agents to converge to
well-performing policies up to 6-7x faster than classical agents without
computational overhead and while retaining good final performance.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning [1.0762853848552156]
We implement over 60 different off-policy agents, each integrating established regularization techniques from recent state-of-the-art algorithms.
We tested these agents across 14 diverse tasks from 2 simulation benchmarks, measuring training metrics related to overestimation, overfitting, and plasticity loss.
A simple Soft Actor-Critic agent, appropriately regularized, reliably finds a better-performing policy within the training regime.
arXiv Detail & Related papers (2024-03-01T13:25:10Z) - Deployable Reinforcement Learning with Variable Control Rate [14.838483990647697]
We propose a variant of Reinforcement Learning (RL) with variable control rate.
In this approach, the policy decides the action the agent should take as well as the duration of the time step associated with that action.
We show the efficacy of SEAC through a proof-of-concept simulation driving an agent with Newtonian kinematics.
arXiv Detail & Related papers (2024-01-17T15:40:11Z) - Time-Efficient Reinforcement Learning with Stochastic Stateful Policies [20.545058017790428]
We present a novel approach for training stateful policies by decomposing the latter into a gradient internal state kernel and a stateless policy.
We introduce different versions of the stateful policy gradient theorem, enabling us to easily instantiate stateful variants of popular reinforcement learning algorithms.
arXiv Detail & Related papers (2023-11-07T15:48:07Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Inapplicable Actions Learning for Knowledge Transfer in Reinforcement
Learning [3.194414753332705]
We show that learning inapplicable actions greatly improves the sample efficiency of RL algorithms.
Thanks to the transferability of the knowledge acquired, it can be reused in other tasks and domains to make the learning process more efficient.
arXiv Detail & Related papers (2022-11-28T17:45:39Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Addressing Action Oscillations through Learning Policy Inertia [26.171039226334504]
Policy Inertia Controller (PIC) serves as a generic plug-in framework to off-the-shelf DRL algorithms.
We propose Nested Policy Iteration as a general training algorithm for PIC-augmented policy.
We derive a practical DRL algorithm, namely Nested Soft Actor-Critic.
arXiv Detail & Related papers (2021-03-03T09:59:43Z) - DDPG++: Striving for Simplicity in Continuous-control Off-Policy
Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled.
Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step.
Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.