Related papers: CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics

CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics

URL: http://arxiv.org/abs/2405.02576v2
Date: Mon, 20 May 2024 04:26:46 GMT
Title: CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
Authors: David Valencia, Henry Williams, Trevor Gee, Bruce A MacDonald, Minas Liarokapis,
Abstract summary: Categorical Distributional Reinforcement Learning (CDRL) has demonstrated superior sample efficiency in learning complex tasks. This paper introduces a pioneering Continuous Distributional Model-Free RL algorithm tailored for continuous action spaces.
Score: 2.229467987498053
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Categorical Distributional Reinforcement Learning (CDRL) has demonstrated superior sample efficiency in learning complex tasks compared to conventional Reinforcement Learning (RL) approaches. However, the practical application of CDRL is encumbered by challenging projection steps, detailed parameter tuning, and domain knowledge. This paper addresses these challenges by introducing a pioneering Continuous Distributional Model-Free RL algorithm tailored for continuous action spaces. The proposed algorithm simplifies the implementation of distributional RL, adopting an actor-critic architecture wherein the critic outputs a continuous probability distribution. Additionally, we propose an ensemble of multiple critics fused through a Kalman fusion mechanism to mitigate overestimation bias. Through a series of experiments, we validate that our proposed method is easy to train and serves as a sample-efficient solution for executing complex continuous-control tasks.

Related papers

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo [55.452453947359736]
We introduce a novel verification method based on Twisted Sequential Monte Carlo (TSMC) We apply TSMC to Large Language Models by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations.
arXiv Detail & Related papers (2024-10-02T18:17:54Z)
Continuous Control with Coarse-to-fine Reinforcement Learning [15.585706638252441]
We present a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner. We introduce a concrete, value-based algorithm within the framework called Coarse-to-fine Q-Network (CQN) CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.
arXiv Detail & Related papers (2024-07-10T16:04:08Z)
Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs [63.36637269634553]
We introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step.<n>We show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales.<n>Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
Model-Free Robust Reinforcement Learning with Sample Complexity Analysis [16.477827600825428]
This paper proposes a model-free DR-RL algorithm leveraging the Multi-level Monte Carlo technique. We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence. Remarkably, our algorithms represent the first model-free DR-RL approach featuring finite sample complexity.
arXiv Detail & Related papers (2024-06-24T19:35:26Z)
Distributionally Robust Constrained Reinforcement Learning under Strong Duality [37.76993170360821]
We study the problem of Distributionally Robust Constrained RL (DRC-RL) The goal is to maximize the expected reward subject to environmental distribution shifts and constraints. We develop an algorithmic framework based on strong duality that enables the first efficient and provable solution.
arXiv Detail & Related papers (2024-06-22T08:51:57Z)
Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks. We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent. Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning [163.44116192806922]
One-step methods perform regularization by doing just a single step of policy improvement. critic regularization methods do many steps of policy improvement with a regularized objective. Applying a multi-step critic regularization method with a regularization coefficient of 1 iteration yields the same policy as one-step RL.
arXiv Detail & Related papers (2023-07-24T17:46:32Z)
Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z)
Single-Trajectory Distributionally Robust Reinforcement Learning [21.955807398493334]
We propose Distributionally Robust RL (DRRL) to enhance performance across a range of environments. Existing DRRL algorithms are either model-based or fail to learn from a single sample trajectory. We design a first fully model-free DRRL algorithm, called distributionally robust Q-learning with single trajectory (DRQ)
arXiv Detail & Related papers (2023-01-27T14:08:09Z)
Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning [20.680417111485305]
We introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation. The improved distributional estimates lend themselves to UCB-based exploration. We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control.
arXiv Detail & Related papers (2022-02-06T03:27:05Z)
Parallelized Reverse Curriculum Generation [62.25453821794469]
For reinforcement learning, it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards. reverse curriculum generation (RCG) provides a reverse expansion approach that automatically generates a curriculum for the agent to learn. We propose a parallelized approach that simultaneously trains multiple AC pairs and periodically exchanges their critics.
arXiv Detail & Related papers (2021-08-04T15:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.