Parallelized Reverse Curriculum Generation
- URL: http://arxiv.org/abs/2108.02128v1
- Date: Wed, 4 Aug 2021 15:58:35 GMT
- Title: Parallelized Reverse Curriculum Generation
- Authors: Zih-Yun Chiu, Yi-Lin Tuan, Hung-yi Lee, Li-Chen Fu
- Abstract summary: For reinforcement learning, it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards.
reverse curriculum generation (RCG) provides a reverse expansion approach that automatically generates a curriculum for the agent to learn.
We propose a parallelized approach that simultaneously trains multiple AC pairs and periodically exchanges their critics.
- Score: 62.25453821794469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For reinforcement learning (RL), it is challenging for an agent to master a
task that requires a specific series of actions due to sparse rewards. To solve
this problem, reverse curriculum generation (RCG) provides a reverse expansion
approach that automatically generates a curriculum for the agent to learn. More
specifically, RCG adapts the initial state distribution from the neighborhood
of a goal to a distance as training proceeds. However, the initial state
distribution generated for each iteration might be biased, thus making the
policy overfit or slowing down the reverse expansion rate. While training RCG
for actor-critic (AC) based RL algorithms, this poor generalization and slow
convergence might be induced by the tight coupling between an AC pair.
Therefore, we propose a parallelized approach that simultaneously trains
multiple AC pairs and periodically exchanges their critics. We empirically
demonstrate that this proposed approach can improve RCG in performance and
convergence, and it can also be applied to other AC based RL algorithms with
adapted initial state distribution.
Related papers
- CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics [2.229467987498053]
Categorical Distributional Reinforcement Learning (CDRL) has demonstrated superior sample efficiency in learning complex tasks.
This paper introduces a pioneering Continuous Distributional Model-Free RL algorithm tailored for continuous action spaces.
arXiv Detail & Related papers (2024-05-04T05:38:38Z) - One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework.
We show that our approach comes with a unified theory for both policy evaluation and control.
We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Conjugated Discrete Distributions for Distributional Reinforcement
Learning [0.0]
We show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process.
We argue that distributional reinforcement learning lends itself to remedy this situation completely.
arXiv Detail & Related papers (2021-12-14T14:14:49Z) - Cross-Trajectory Representation Learning for Zero-Shot Generalization in
RL [21.550201956884532]
generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training.
Many promising approaches to this challenge consider RL as a process of training two functions simultaneously.
We propose Cross-Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations.
arXiv Detail & Related papers (2021-06-04T00:43:10Z) - Gradient Coding with Dynamic Clustering for Straggler-Tolerant
Distributed Learning [55.052517095437]
gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers.
A significant performance bottleneck for the per-iteration completion time in distributed synchronous GD is $straggling$ workers.
Coded distributed techniques have been introduced recently to mitigate stragglers and to speed up GD iterations by assigning redundant computations to workers.
We propose a novel dynamic GC scheme, which assigns redundant data to workers to acquire the flexibility to choose from among a set of possible codes depending on the past straggling behavior.
arXiv Detail & Related papers (2021-03-01T18:51:29Z) - Phase Retrieval using Expectation Consistent Signal Recovery Algorithm
based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems.
Recent advances in deep learning have opened up a new possibility for robust and fast PR.
We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Self-Paced Deep Reinforcement Learning [42.467323141301826]
Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning.
Despite empirical successes, an open question in CRL is how to automatically generate a curriculum for a given reinforcement learning (RL) agent, avoiding manual design.
We propose an answer by interpreting the curriculum generation as an inference problem, where distributions over tasks are progressively learned to approach the target task.
This approach leads to an automatic curriculum generation, whose pace is controlled by the agent, with solid theoretical motivation and easily integrated with deep RL algorithms.
arXiv Detail & Related papers (2020-04-24T15:48:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.