A Relational Intervention Approach for Unsupervised Dynamics
Generalization in Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2206.04551v1
- Date: Thu, 9 Jun 2022 15:01:36 GMT
- Title: A Relational Intervention Approach for Unsupervised Dynamics
Generalization in Model-Based Reinforcement Learning
- Authors: Jixian Guo, Mingming Gong, Dacheng Tao
- Abstract summary: We introduce an interventional prediction module to estimate the probability of two estimated $hatz_i, hatz_j$ belonging to the same environment.
We empirically show that $hatZ$ estimated by our method enjoy less redundant information than previous methods.
- Score: 113.75991721607174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The generalization of model-based reinforcement learning (MBRL) methods to
environments with unseen transition dynamics is an important yet challenging
problem. Existing methods try to extract environment-specified information $Z$
from past transition segments to make the dynamics prediction model
generalizable to different dynamics. However, because environments are not
labelled, the extracted information inevitably contains redundant information
unrelated to the dynamics in transition segments and thus fails to maintain a
crucial property of $Z$: $Z$ should be similar in the same environment and
dissimilar in different ones. As a result, the learned dynamics prediction
function will deviate from the true one, which undermines the generalization
ability. To tackle this problem, we introduce an interventional prediction
module to estimate the probability of two estimated $\hat{z}_i, \hat{z}_j$
belonging to the same environment. Furthermore, by utilizing the $Z$'s
invariance within a single environment, a relational head is proposed to
enforce the similarity between $\hat{{Z}}$ from the same environment. As a
result, the redundant information will be reduced in $\hat{Z}$. We empirically
show that $\hat{{Z}}$ estimated by our method enjoy less redundant information
than previous methods, and such $\hat{{Z}}$ can significantly reduce dynamics
prediction errors and improve the performance of model-based RL methods on
zero-shot new environments with unseen dynamics. The codes of this method are
available at \url{https://github.com/CR-Gjx/RIA}.
Related papers
- Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs [63.47351876442425]
We study episodic linear mixture MDPs with the unknown transition and adversarial rewards under full-information feedback.
We propose a novel algorithm that combines the benefits of two popular methods: occupancy-measure-based and policy-based.
Our algorithm enjoys an $widetildemathcalO(d sqrtH3 K + sqrtHK(H + barP_K$)$ dynamic regret, where $d$ is the feature dimension.
arXiv Detail & Related papers (2024-11-05T13:55:52Z) - Invariant Risk Minimization Is A Total Variation Model [3.000494957386027]
Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning.
We show that IRM is essentially a total variation based on $L2$ (TV-$ell$) of the learning risk.
We propose a novel IRM framework based on the TV-$ell$ model.
arXiv Detail & Related papers (2024-05-02T15:34:14Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Out-of-Variable Generalization for Discriminative Models [13.075802230332298]
In machine learning, the ability of an agent to do well in new environments is a critical aspect of intelligence.
We investigate $textitout-of-variable$ generalization, which pertains to environments with variables that were never jointly observed before.
We propose a method that exhibits non-trivial out-of-variable generalization performance when facing an overlapping, yet distinct, set of causal predictors.
arXiv Detail & Related papers (2023-04-16T21:29:54Z) - Learning Optimal Features via Partial Invariance [18.552839725370383]
Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments.
We show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $textitpartial invariance$.
Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.
arXiv Detail & Related papers (2023-01-28T02:48:14Z) - Provable Domain Generalization via Invariant-Feature Subspace Recovery [18.25619572103648]
In this paper, we propose to achieve domain generalization with Invariant- Subspace Recovery (ISR)
Unlike training IRM, our algorithms bypass non-variantity issues and enjoy global convergence.
In addition, on three real-world image datasets, we show that ISR- can be used as a simple yet effective post-processing method.
arXiv Detail & Related papers (2022-01-30T21:22:47Z) - Iterative Feature Matching: Toward Provable Domain Generalization with
Logarithmic Environments [55.24895403089543]
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
We present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(logd_s)$ environments.
arXiv Detail & Related papers (2021-06-18T04:39:19Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.