Estimating Disentangled Belief about Hidden State and Hidden Task for
Meta-RL
- URL: http://arxiv.org/abs/2105.06660v1
- Date: Fri, 14 May 2021 06:11:36 GMT
- Title: Estimating Disentangled Belief about Hidden State and Hidden Task for
Meta-RL
- Authors: Kei Akuzawa, Yusuke Iwasawa, Yutaka Matsuo
- Abstract summary: meta-reinforcement learning (meta-RL) algorithms enable autonomous agents to adapt new tasks from small amount of experience.
In meta-RL, the specification (such as reward function) of current task is hidden from the agent.
We propose estimating disentangled belief about task and states, leveraging an inductive bias that the task and states can be regarded as global and local features of each task.
- Score: 27.78147889149745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is considerable interest in designing meta-reinforcement learning
(meta-RL) algorithms, which enable autonomous agents to adapt new tasks from
small amount of experience. In meta-RL, the specification (such as reward
function) of current task is hidden from the agent. In addition, states are
hidden within each task owing to sensor noise or limitations in realistic
environments. Therefore, the meta-RL agent faces the challenge of specifying
both the hidden task and states based on small amount of experience. To address
this, we propose estimating disentangled belief about task and states,
leveraging an inductive bias that the task and states can be regarded as global
and local features of each task. Specifically, we train a hierarchical
state-space model (HSSM) parameterized by deep neural networks as an
environment model, whose global and local latent variables correspond to task
and states, respectively. Because the HSSM does not allow analytical
computation of posterior distribution, i.e., belief, we employ amortized
inference to approximate it. After the belief is obtained, we can augment
observations of a model-free policy with the belief to efficiently train the
policy. Moreover, because task and state information are factorized and
interpretable, the downstream policy training is facilitated compared with the
prior methods that did not consider the hierarchical nature. Empirical
validations on a GridWorld environment confirm that the HSSM can separate the
hidden task and states information. Then, we compare the meta-RL agent with the
HSSM to prior meta-RL methods in MuJoCo environments, and confirm that our
agent requires less training data and reaches higher final performance.
Related papers
- Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning [23.45043290237396]
MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning.
On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
arXiv Detail & Related papers (2023-04-29T15:46:19Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Denoised MDPs: Learning World Models Better Than the World Itself [94.74665254213588]
This work categorizes information out in the wild into four types based on controllability and relation with reward, and formulates useful information as that which is both controllable and reward-relevant.
Experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone.
arXiv Detail & Related papers (2022-06-30T17:59:49Z) - Provable Benefits of Representational Transfer in Reinforcement Learning [59.712501044999875]
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation.
We show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
arXiv Detail & Related papers (2022-05-29T04:31:29Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - An Information-Theoretic Analysis of the Impact of Task Similarity on
Meta-Learning [44.320945743871285]
We present novel information-theoretic bounds on the average absolute value of the meta-generalization gap.
Our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap.
arXiv Detail & Related papers (2021-01-21T01:38:16Z) - Transfer Meta-Learning: Information-Theoretic Bounds and Information
Meta-Risk Minimization [47.7605527786164]
Meta-learning automatically infers an inductive bias by observing data from a number of related tasks.
We introduce the problem of transfer meta-learning, in which tasks are drawn from a target task environment during meta-testing.
arXiv Detail & Related papers (2020-11-04T12:55:43Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.