Weakly-Supervised Reinforcement Learning for Controllable Behavior
- URL: http://arxiv.org/abs/2004.02860v2
- Date: Wed, 18 Nov 2020 02:03:28 GMT
- Title: Weakly-Supervised Reinforcement Learning for Controllable Behavior
- Authors: Lisa Lee, Benjamin Eysenbach, Ruslan Salakhutdinov, Shixiang Shane Gu,
Chelsea Finn
- Abstract summary: Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks.
In many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve.
We introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks.
- Score: 126.04932929741538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) is a powerful framework for learning to take
actions to solve tasks. However, in many settings, an agent must winnow down
the inconceivably large space of all possible tasks to the single task that it
is currently being asked to solve. Can we instead constrain the space of tasks
to those that are semantically meaningful? In this work, we introduce a
framework for using weak supervision to automatically disentangle this
semantically meaningful subspace of tasks from the enormous space of
nonsensical "chaff" tasks. We show that this learned subspace enables efficient
exploration and provides a representation that captures distance between
states. On a variety of challenging, vision-based continuous control problems,
our approach leads to substantial performance gains, particularly as the
complexity of the environment grows.
Related papers
- Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation [0.0]
The Task-Agnostic Policy Distillation (TAPD) framework is introduced.
This paper addresses the problem of continual learning.
By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently.
arXiv Detail & Related papers (2024-11-25T16:18:39Z) - Hierarchical reinforcement learning with natural language subgoals [26.725710518119044]
We use data from humans solving tasks to softly supervise the goal space for a set of long range tasks in a 3D embodied environment.
This has two advantages: first, it is easy to generate this data from naive human participants; second, it is flexible enough to represent a vast range of sub-goals in human-relevant tasks.
Our approach outperforms agents that clone expert behavior on these tasks, as well as HRL from scratch without this supervised sub-goal space.
arXiv Detail & Related papers (2023-09-20T18:03:04Z) - Towards an Interpretable Hierarchical Agent Framework using Semantic
Goals [6.677083312952721]
This work introduces an interpretable hierarchical agent framework by combining planning and semantic goal directed reinforcement learning.
We evaluate our framework on a robotic block manipulation task and show that it performs better than other methods.
arXiv Detail & Related papers (2022-10-16T02:04:13Z) - Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model.
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z) - Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z) - Wish you were here: Hindsight Goal Selection for long-horizon dexterous
manipulation [14.901636098553848]
Solving tasks with a sparse reward in a sample-efficient manner poses a challenge to modern reinforcement learning.
Existing strategies explore based on task-agnostic goal distributions, which can render the solution of long-horizon tasks impractical.
We extend hindsight relabelling mechanisms to guide exploration along task-specific distributions implied by a small set of successful demonstrations.
arXiv Detail & Related papers (2021-12-01T16:12:32Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Solving Compositional Reinforcement Learning Problems via Task Reduction [18.120631058025406]
We propose a novel learning paradigm, Self-Imitation via Reduction (SIR) for solving compositional reinforcement learning problems.
SIR is based on two core ideas: task reduction and self-imitation.
Experiment results show that SIR can significantly accelerate and improve learning on a variety of challenging sparse-reward continuous-control problems.
arXiv Detail & Related papers (2021-03-13T03:26:33Z) - Continual Learning of Control Primitives: Skill Discovery via
Reset-Games [128.36174682118488]
We show how a single method can allow an agent to acquire skills with minimal supervision.
We do this by exploiting the insight that the need to "reset" an agent to a broad set of initial states for a learning task provides a natural setting to learn a diverse set of "reset-skills"
arXiv Detail & Related papers (2020-11-10T18:07:44Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.