Self-Paced Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2004.11812v5
- Date: Fri, 23 Oct 2020 09:42:00 GMT
- Title: Self-Paced Deep Reinforcement Learning
- Authors: Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen
- Abstract summary: Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning.
Despite empirical successes, an open question in CRL is how to automatically generate a curriculum for a given reinforcement learning (RL) agent, avoiding manual design.
We propose an answer by interpreting the curriculum generation as an inference problem, where distributions over tasks are progressively learned to approach the target task.
This approach leads to an automatic curriculum generation, whose pace is controlled by the agent, with solid theoretical motivation and easily integrated with deep RL algorithms.
- Score: 42.467323141301826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Curriculum reinforcement learning (CRL) improves the learning speed and
stability of an agent by exposing it to a tailored series of tasks throughout
learning. Despite empirical successes, an open question in CRL is how to
automatically generate a curriculum for a given reinforcement learning (RL)
agent, avoiding manual design. In this paper, we propose an answer by
interpreting the curriculum generation as an inference problem, where
distributions over tasks are progressively learned to approach the target task.
This approach leads to an automatic curriculum generation, whose pace is
controlled by the agent, with solid theoretical motivation and easily
integrated with deep RL algorithms. In the conducted experiments, the curricula
generated with the proposed algorithm significantly improve learning
performance across several environments and deep RL algorithms, matching or
outperforming state-of-the-art existing CRL algorithms.
Related papers
- Tracking Control for a Spherical Pendulum via Curriculum Reinforcement
Learning [27.73555826776087]
Reinforcement Learning (RL) allows learning non-trivial robot control laws purely from data.
In this paper, we pair a recent algorithm for automatically building curricula with RL on massively parallelized simulations.
We demonstrate the potential of curriculum RL to jointly learn state estimation and control for non-linear tracking tasks.
arXiv Detail & Related papers (2023-09-25T12:48:47Z) - On the Benefit of Optimal Transport for Curriculum Reinforcement Learning [32.59609255906321]
We focus on framing curricula ass between task distributions.
We frame the generation of a curriculum as a constrained optimal transport problem.
Benchmarks show that this way of curriculum generation can improve upon existing CRL methods.
arXiv Detail & Related papers (2023-09-25T12:31:37Z) - Reward-Machine-Guided, Self-Paced Reinforcement Learning [30.42334205249944]
We develop a self-paced reinforcement learning algorithm guided by reward machines.
The proposed algorithm achieves optimal behavior reliably even in cases in which existing baselines cannot make any meaningful progress.
It also decreases the curriculum length and reduces the variance in the curriculum generation process by up to one-fourth and four orders of magnitude, respectively.
arXiv Detail & Related papers (2023-05-25T22:13:37Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - CLUTR: Curriculum Learning via Unsupervised Task Representation Learning [130.79246770546413]
CLUTR is a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization.
We show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments.
arXiv Detail & Related papers (2022-10-19T01:45:29Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - A Probabilistic Interpretation of Self-Paced Learning with Applications
to Reinforcement Learning [30.69129405392038]
We present an approach for automated curriculum generation in reinforcement learning.
We formalize the well-known self-paced learning paradigm as inducing a distribution over training tasks.
Experiments show that training on this induced distribution helps to avoid poor local optima across RL algorithms.
arXiv Detail & Related papers (2021-02-25T21:06:56Z) - Deep Reinforcement Learning for Autonomous Driving: A Survey [0.3694429692322631]
This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks.
It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms.
The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
arXiv Detail & Related papers (2020-02-02T18:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.