Behavior Priors for Efficient Reinforcement Learning
- URL: http://arxiv.org/abs/2010.14274v1
- Date: Tue, 27 Oct 2020 13:17:18 GMT
- Title: Behavior Priors for Efficient Reinforcement Learning
- Authors: Dhruva Tirumala, Alexandre Galashov, Hyeonwoo Noh, Leonard
Hasenclever, Razvan Pascanu, Jonathan Schwarz, Guillaume Desjardins, Wojciech
Marian Czarnecki, Arun Ahuja, Yee Whye Teh, Nicolas Heess
- Abstract summary: We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
- Score: 97.81587970962232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As we deploy reinforcement learning agents to solve increasingly challenging
problems, methods that allow us to inject prior knowledge about the structure
of the world and effective solution strategies becomes increasingly important.
In this work we consider how information and architectural constraints can be
combined with ideas from the probabilistic modeling literature to learn
behavior priors that capture the common movement and interaction patterns that
are shared across a set of related tasks or contexts. For example the day-to
day behavior of humans comprises distinctive locomotion and manipulation
patterns that recur across many different situations and goals. We discuss how
such behavior patterns can be captured using probabilistic trajectory models
and how these can be integrated effectively into reinforcement learning
schemes, e.g.\ to facilitate multi-task and transfer learning. We then extend
these ideas to latent variable models and consider a formulation to learn
hierarchical priors that capture different aspects of the behavior in reusable
modules. We discuss how such latent variable formulations connect to related
work on hierarchical reinforcement learning (HRL) and mutual information and
curiosity based objectives, thereby offering an alternative perspective on
existing ideas. We demonstrate the effectiveness of our framework by applying
it to a range of simulated continuous control domains.
Related papers
- Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning [42.03016266965012]
We find that applying human habits of organizing and connecting information can serve as an efficient strategy when training deep learning models.
We propose a novel regularization loss function that encourages models to focus more on challenging knowledge areas.
arXiv Detail & Related papers (2024-10-06T01:30:40Z) - Generative Intrinsic Optimization: Intrinsic Control with Model Learning [5.439020425819001]
Future sequence represents the outcome after executing the action into the environment.
Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning.
We propose a policy scheme that seamlessly incorporates the mutual information, ensuring convergence to the optimal policy.
arXiv Detail & Related papers (2023-10-12T07:50:37Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Learning Generalizable Representations for Reinforcement Learning via
Adaptive Meta-learner of Behavioral Similarities [43.327357653393015]
We propose a novel meta-learner-based framework for representation learning regarding behavioral similarities for reinforcement learning.
We empirically demonstrate that our proposed framework outperforms state-of-the-art baselines on several benchmarks.
arXiv Detail & Related papers (2022-12-26T11:11:23Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Procedure Planning in Instructional Videosvia Contextual Modeling and
Model-based Policy Learning [114.1830997893756]
This work focuses on learning a model to plan goal-directed actions in real-life videos.
We propose novel algorithms to model human behaviors through Bayesian Inference and model-based Imitation Learning.
arXiv Detail & Related papers (2021-10-05T01:06:53Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.