Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations
- URL: http://arxiv.org/abs/2412.01114v1
- Date: Mon, 02 Dec 2024 04:37:12 GMT
- Title: Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations
- Authors: Cevahir Koprulu, Po-han Li, Tianyu Qiu, Ruihan Zhao, Tyler Westenbroek, David Fridovich-Keil, Sandeep Chinchali, Ufuk Topcu,
- Abstract summary: Continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks.
Online RL methods can automatically explore the state space to solve each new task.
However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases.
We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations.
- Score: 24.041217922654738
- License:
- Abstract: Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.
Related papers
- Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation [0.0]
The Task-Agnostic Policy Distillation (TAPD) framework is introduced.
This paper addresses the problem of continual learning.
By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently.
arXiv Detail & Related papers (2024-11-25T16:18:39Z) - Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal [54.93261535899478]
In real-world applications, such as robotic control of reinforcement learning, the tasks are changing, and new tasks arise in a sequential order.
This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge.
We propose a rehearsal-based continual diffusion model, called Continual diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability)
arXiv Detail & Related papers (2024-09-04T08:21:47Z) - Reinforcement Learning via Auxiliary Task Distillation [24.87090247662755]
We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill)
AuxDistill enables reinforcement learning to perform long-horizon robot control problems by distilling behaviors from auxiliary tasks.
We demonstrate that AuxDistill can learn a pixels-to-actions policy for a challenging multi-stage embodied object rearrangement task without demonstrations, a learning curriculum, or pre-trained skills.
arXiv Detail & Related papers (2024-06-24T23:02:18Z) - Accelerating Exploration with Unlabeled Prior Data [66.43995032226466]
We study how prior data without reward labels may be used to guide and accelerate exploration for an agent solving a new sparse reward task.
We propose a simple approach that learns a reward model from online experience, labels the unlabeled prior data with optimistic rewards, and then uses it concurrently alongside the online data for downstream policy and critic optimization.
arXiv Detail & Related papers (2023-11-09T00:05:17Z) - Reinforcement learning with Demonstrations from Mismatched Task under
Sparse Reward [7.51772160511614]
Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems.
Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task.
In this paper, we consider the case where the target task is mismatched from but similar with that of the expert.
Existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards.
arXiv Detail & Related papers (2022-12-03T02:24:59Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.