CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
- URL: http://arxiv.org/abs/2405.02576v2
- Date: Mon, 20 May 2024 04:26:46 GMT
- Title: CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
- Authors: David Valencia, Henry Williams, Trevor Gee, Bruce A MacDonald, Minas Liarokapis,
- Abstract summary: Categorical Distributional Reinforcement Learning (CDRL) has demonstrated superior sample efficiency in learning complex tasks.
This paper introduces a pioneering Continuous Distributional Model-Free RL algorithm tailored for continuous action spaces.
- Score: 2.229467987498053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Categorical Distributional Reinforcement Learning (CDRL) has demonstrated superior sample efficiency in learning complex tasks compared to conventional Reinforcement Learning (RL) approaches. However, the practical application of CDRL is encumbered by challenging projection steps, detailed parameter tuning, and domain knowledge. This paper addresses these challenges by introducing a pioneering Continuous Distributional Model-Free RL algorithm tailored for continuous action spaces. The proposed algorithm simplifies the implementation of distributional RL, adopting an actor-critic architecture wherein the critic outputs a continuous probability distribution. Additionally, we propose an ensemble of multiple critics fused through a Kalman fusion mechanism to mitigate overestimation bias. Through a series of experiments, we validate that our proposed method is easy to train and serves as a sample-efficient solution for executing complex continuous-control tasks.
Related papers
- Continuous Control with Coarse-to-fine Reinforcement Learning [15.585706638252441]
We present a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner.
We introduce a concrete, value-based algorithm within the framework called Coarse-to-fine Q-Network (CQN)
CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.
arXiv Detail & Related papers (2024-07-10T16:04:08Z) - Distributionally Robust Constrained Reinforcement Learning under Strong Duality [37.76993170360821]
We study the problem of Distributionally Robust Constrained RL (DRC-RL)
The goal is to maximize the expected reward subject to environmental distribution shifts and constraints.
We develop an algorithmic framework based on strong duality that enables the first efficient and provable solution.
arXiv Detail & Related papers (2024-06-22T08:51:57Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - A Connection between One-Step Regularization and Critic Regularization
in Reinforcement Learning [163.44116192806922]
One-step methods perform regularization by doing just a single step of policy improvement.
critic regularization methods do many steps of policy improvement with a regularized objective.
Applying a multi-step critic regularization method with a regularization coefficient of 1 iteration yields the same policy as one-step RL.
arXiv Detail & Related papers (2023-07-24T17:46:32Z) - One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework.
We show that our approach comes with a unified theory for both policy evaluation and control.
We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - Single-Trajectory Distributionally Robust Reinforcement Learning [13.013268095049236]
Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI)
However, RL is often criticized for having the same training environment as the test one, which also hinders its application in the real world.
To mitigate this problem, Distributionally Robust RL (DRRL) is proposed to improve the worst performance in a set of environments that may contain the unknown test environment.
arXiv Detail & Related papers (2023-01-27T14:08:09Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Exploration with Multi-Sample Target Values for Distributional
Reinforcement Learning [20.680417111485305]
We introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation.
The improved distributional estimates lend themselves to UCB-based exploration.
We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control.
arXiv Detail & Related papers (2022-02-06T03:27:05Z) - Parallelized Reverse Curriculum Generation [62.25453821794469]
For reinforcement learning, it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards.
reverse curriculum generation (RCG) provides a reverse expansion approach that automatically generates a curriculum for the agent to learn.
We propose a parallelized approach that simultaneously trains multiple AC pairs and periodically exchanges their critics.
arXiv Detail & Related papers (2021-08-04T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.