Denoised MDPs: Learning World Models Better Than the World Itself
- URL: http://arxiv.org/abs/2206.15477v6
- Date: Thu, 6 Apr 2023 23:56:38 GMT
- Title: Denoised MDPs: Learning World Models Better Than the World Itself
- Authors: Tongzhou Wang, Simon S. Du, Antonio Torralba, Phillip Isola, Amy
Zhang, Yuandong Tian
- Abstract summary: This work categorizes information out in the wild into four types based on controllability and relation with reward, and formulates useful information as that which is both controllable and reward-relevant.
Experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone.
- Score: 94.74665254213588
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The ability to separate signal from noise, and reason with clean
abstractions, is critical to intelligence. With this ability, humans can
efficiently perform real world tasks without considering all possible nuisance
factors.How can artificial agents do the same? What kind of information can
agents safely discard as noises?
In this work, we categorize information out in the wild into four types based
on controllability and relation with reward, and formulate useful information
as that which is both controllable and reward-relevant. This framework
clarifies the kinds information removed by various prior work on representation
learning in reinforcement learning (RL), and leads to our proposed approach of
learning a Denoised MDP that explicitly factors out certain noise distractors.
Extensive experiments on variants of DeepMind Control Suite and RoboDesk
demonstrate superior performance of our denoised world model over using raw
observations alone, and over prior works, across policy optimization control
tasks as well as the non-control task of joint position regression.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Causal Coordinated Concurrent Reinforcement Learning [8.654978787096807]
We propose a novel algorithmic framework for data sharing and coordinated exploration for the purpose of learning more data-efficient and better performing policies under a concurrent reinforcement learning setting.
Our algorithm leverages a causal inference algorithm in the form of Additive Noise Model - Mixture Model (ANM-MM) in extracting model parameters governing individual differentials via independence enforcement.
We propose a new data sharing scheme based on a similarity measure of the extracted model parameters and demonstrate superior learning speeds on a set of autoregressive, pendulum and cart-pole swing-up tasks.
arXiv Detail & Related papers (2024-01-31T17:20:28Z) - Building Minimal and Reusable Causal State Abstractions for
Reinforcement Learning [63.58935783293342]
Causal Bisimulation Modeling (CBM) is a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction.
CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
arXiv Detail & Related papers (2024-01-23T05:43:15Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Optimal Interpretability-Performance Trade-off of Classification Trees
with Black-Box Reinforcement Learning [0.0]
Interpretability of AI models allows for user safety checks to build trust in these models.
Decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data.
To learn compact trees, a Reinforcement Learning framework has been recently proposed to explore the space of DTs.
arXiv Detail & Related papers (2023-04-11T09:43:23Z) - Information Maximizing Curriculum: A Curriculum-Based Approach for
Imitating Diverse Skills [14.685043874797742]
We propose a curriculum-based approach that assigns a weight to each data point and encourages the model to specialize in the data it can represent.
To cover all modes and thus, enable diverse behavior, we extend our approach to a mixture of experts (MoE) policy, where each mixture component selects its own subset of the training data for learning.
A novel, maximum entropy-based objective is proposed to achieve full coverage of the dataset, thereby enabling the policy to encompass all modes within the data distribution.
arXiv Detail & Related papers (2023-03-27T16:02:50Z) - Ignorance is Bliss: Robust Control via Information Gating [60.17644038829572]
Informational parsimony provides a useful inductive bias for learning representations that achieve better generalization by being robust to noise and spurious correlations.
We propose textitinformation gating as a way to learn parsimonious representations that identify the minimal information required for a task.
arXiv Detail & Related papers (2023-03-10T18:31:50Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.