Open the Black Box: Step-based Policy Updates for Temporally-Correlated
Episodic Reinforcement Learning
- URL: http://arxiv.org/abs/2401.11437v1
- Date: Sun, 21 Jan 2024 09:24:24 GMT
- Title: Open the Black Box: Step-based Policy Updates for Temporally-Correlated
Episodic Reinforcement Learning
- Authors: Ge Li, Hongyi Zhou, Dominik Roth, Serge Thilges, Fabian Otto, Rudolf
Lioutikov, Gerhard Neumann
- Abstract summary: We introduce a novel ERL algorithm, Temporally-Correlated Episodic RL (TCE), which effectively utilizes step information in episodic policy updates.
TCE achieves comparable performance to recent ERL methods while maintaining data efficiency akin to state-of-the-art (SoTA) step-based RL.
- Score: 26.344135827307113
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current advancements in reinforcement learning (RL) have predominantly
focused on learning step-based policies that generate actions for each
perceived state. While these methods efficiently leverage step information from
environmental interaction, they often ignore the temporal correlation between
actions, resulting in inefficient exploration and unsmooth trajectories that
are challenging to implement on real hardware. Episodic RL (ERL) seeks to
overcome these challenges by exploring in parameters space that capture the
correlation of actions. However, these approaches typically compromise data
efficiency, as they treat trajectories as opaque \emph{black boxes}. In this
work, we introduce a novel ERL algorithm, Temporally-Correlated Episodic RL
(TCE), which effectively utilizes step information in episodic policy updates,
opening the 'black box' in existing ERL methods while retaining the smooth and
consistent exploration in parameter space. TCE synergistically combines the
advantages of step-based and episodic RL, achieving comparable performance to
recent ERL methods while maintaining data efficiency akin to state-of-the-art
(SoTA) step-based RL.
Related papers
- TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning [27.93845816476777]
This work introduces Transformer-based Off-Policy Episodic Reinforcement Learning (TOP-ERL)
TOP-ERL is a novel algorithm that enables off-policy updates in the ERL framework.
arXiv Detail & Related papers (2024-10-12T13:55:26Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - CCE: Sample Efficient Sparse Reward Policy Learning for Robotic Navigation via Confidence-Controlled Exploration [72.24964965882783]
Confidence-Controlled Exploration (CCE) is designed to enhance the training sample efficiency of reinforcement learning algorithms for sparse reward settings such as robot navigation.
CCE is based on a novel relationship we provide between gradient estimation and policy entropy.
We demonstrate through simulated and real-world experiments that CCE outperforms conventional methods that employ constant trajectory lengths and entropy regularization.
arXiv Detail & Related papers (2023-06-09T18:45:15Z) - Bridging the Gap Between Offline and Online Reinforcement Learning
Evaluation Methodologies [6.303272140868826]
Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces.
Current deep RL algorithms require a tremendous amount of environment interactions for learning.
offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data.
arXiv Detail & Related papers (2022-12-15T20:36:10Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.