Reinforcement Learning via Auxiliary Task Distillation
- URL: http://arxiv.org/abs/2406.17168v1
- Date: Mon, 24 Jun 2024 23:02:18 GMT
- Title: Reinforcement Learning via Auxiliary Task Distillation
- Authors: Abhinav Narayan Harish, Larry Heck, Josiah P. Hanna, Zsolt Kira, Andrew Szot,
- Abstract summary: We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill)
AuxDistill enables reinforcement learning to perform long-horizon robot control problems by distilling behaviors from auxiliary tasks.
We demonstrate that AuxDistill can learn a pixels-to-actions policy for a challenging multi-stage embodied object rearrangement task without demonstrations, a learning curriculum, or pre-trained skills.
- Score: 24.87090247662755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves this by concurrently carrying out multi-task RL with auxiliary tasks, which are easier to learn and relevant to the main task. A weighted distillation loss transfers behaviors from these auxiliary tasks to solve the main task. We demonstrate that AuxDistill can learn a pixels-to-actions policy for a challenging multi-stage embodied object rearrangement task from the environment reward without demonstrations, a learning curriculum, or pre-trained skills. AuxDistill achieves $2.3 \times$ higher success than the previous state-of-the-art baseline in the Habitat Object Rearrangement benchmark and outperforms methods that use pre-trained skills and expert demonstrations.
Related papers
- Unprejudiced Training Auxiliary Tasks Makes Primary Better: A Multi-Task Learning Perspective [55.531894882776726]
Multi-task learning methods suggest using auxiliary tasks to enhance a neural network's performance on a specific primary task.
Previous methods often select auxiliary tasks carefully but treat them as secondary during training.
We propose an uncertainty-based impartial learning method that ensures balanced training across all tasks.
arXiv Detail & Related papers (2024-12-27T09:27:18Z) - Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations [24.041217922654738]
Continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks.
Online RL methods can automatically explore the state space to solve each new task.
However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases.
We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations.
arXiv Detail & Related papers (2024-12-02T04:37:12Z) - Auxiliary Learning as an Asymmetric Bargaining Game [50.826710465264505]
We propose a novel approach, named AuxiNash, for balancing tasks in auxiliary learning.
We describe an efficient procedure for learning the bargaining power of tasks based on their contribution to the performance of the main task.
We evaluate AuxiNash on multiple multi-task benchmarks and find that it consistently outperforms competing methods.
arXiv Detail & Related papers (2023-01-31T09:41:39Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - Abstract Demonstrations and Adaptive Exploration for Efficient and
Stable Multi-step Sparse Reward Reinforcement Learning [44.968170318777105]
This paper proposes a DRL exploration technique, termed A2, which integrates two components inspired by human experiences: Abstract demonstrations and Adaptive exploration.
A2 starts by decomposing a complex task into subtasks, and then provides the correct orders of subtasks to learn.
We demonstrate that A2 can aid popular DRL algorithms to learn more efficiently and stably in these environments.
arXiv Detail & Related papers (2022-07-19T12:56:41Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Transfer Learning in Conversational Analysis through Reusing
Preprocessing Data as Supervisors [52.37504333689262]
Using noisy labels in single-task learning increases the risk of over-fitting.
Auxiliary tasks could improve the performance of the primary task learning during the same training.
arXiv Detail & Related papers (2021-12-02T08:40:42Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Learning Context-aware Task Reasoning for Efficient Meta-reinforcement
Learning [29.125234093368732]
We propose a novel meta-RL strategy to achieve human-level efficiency in learning novel tasks.
We decompose the meta-RL problem into three sub-tasks, task-exploration, task-inference and task-fulfillment.
Our algorithm effectively performs exploration for task inference, improves sample efficiency during both training and testing, and mitigates the meta-overfitting problem.
arXiv Detail & Related papers (2020-03-03T07:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.