Related papers: Weakly-Supervised Reinforcement Learning for Controllable Behavior

Weakly-Supervised Reinforcement Learning for Controllable Behavior

URL: http://arxiv.org/abs/2004.02860v2
Date: Wed, 18 Nov 2020 02:03:28 GMT
Title: Weakly-Supervised Reinforcement Learning for Controllable Behavior
Authors: Lisa Lee, Benjamin Eysenbach, Ruslan Salakhutdinov, Shixiang Shane Gu, Chelsea Finn
Abstract summary: Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. In many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve. We introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks.
Score: 126.04932929741538
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. However, in many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve. Can we instead constrain the space of tasks to those that are semantically meaningful? In this work, we introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks. We show that this learned subspace enables efficient exploration and provides a representation that captures distance between states. On a variety of challenging, vision-based continuous control problems, our approach leads to substantial performance gains, particularly as the complexity of the environment grows.

Related papers

Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning [23.113399772741108]
Long-horizon tasks in robotic manipulation present significant challenges in reinforcement learning. We propose DEMO3, a framework that exploits this structure for efficient learning from visual inputs. Our evaluations demonstrate that our method improves data-efficiency by an average of 40% and by 70% on particularly difficult tasks.
arXiv Detail & Related papers (2025-03-03T18:57:08Z)
Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation [0.0]
The Task-Agnostic Policy Distillation (TAPD) framework is introduced. This paper addresses the problem of continual learning. By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently.
arXiv Detail & Related papers (2024-11-25T16:18:39Z)
Hierarchical reinforcement learning with natural language subgoals [26.725710518119044]
We use data from humans solving tasks to softly supervise the goal space for a set of long range tasks in a 3D embodied environment. This has two advantages: first, it is easy to generate this data from naive human participants; second, it is flexible enough to represent a vast range of sub-goals in human-relevant tasks. Our approach outperforms agents that clone expert behavior on these tasks, as well as HRL from scratch without this supervised sub-goal space.
arXiv Detail & Related papers (2023-09-20T18:03:04Z)
Towards an Interpretable Hierarchical Agent Framework using Semantic Goals [6.677083312952721]
This work introduces an interpretable hierarchical agent framework by combining planning and semantic goal directed reinforcement learning. We evaluate our framework on a robotic block manipulation task and show that it performs better than other methods.
arXiv Detail & Related papers (2022-10-16T02:04:13Z)
Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model. Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization. Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z)
Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting. We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation. We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z)
Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation [14.901636098553848]
Solving tasks with a sparse reward in a sample-efficient manner poses a challenge to modern reinforcement learning. Existing strategies explore based on task-agnostic goal distributions, which can render the solution of long-horizon tasks impractical. We extend hindsight relabelling mechanisms to guide exploration along task-specific distributions implied by a small set of successful demonstrations.
arXiv Detail & Related papers (2021-12-01T16:12:32Z)
Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z)
Solving Compositional Reinforcement Learning Problems via Task Reduction [18.120631058025406]
We propose a novel learning paradigm, Self-Imitation via Reduction (SIR) for solving compositional reinforcement learning problems. SIR is based on two core ideas: task reduction and self-imitation. Experiment results show that SIR can significantly accelerate and improve learning on a variety of challenging sparse-reward continuous-control problems.
arXiv Detail & Related papers (2021-03-13T03:26:33Z)
Continual Learning of Control Primitives: Skill Discovery via Reset-Games [128.36174682118488]
We show how a single method can allow an agent to acquire skills with minimal supervision. We do this by exploiting the insight that the need to "reset" an agent to a broad set of initial states for a learning task provides a natural setting to learn a diverse set of "reset-skills"
arXiv Detail & Related papers (2020-11-10T18:07:44Z)
Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.