Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
- URL: http://arxiv.org/abs/2110.08847v1
- Date: Sun, 17 Oct 2021 15:21:27 GMT
- Title: Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
- Authors: Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal,
John Langford
- Abstract summary: Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
- Score: 85.52408288789164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world applications of reinforcement learning (RL) require the agent
to deal with high-dimensional observations such as those generated from a
megapixel camera. Prior work has addressed such problems with representation
learning, through which the agent can provably extract endogenous, latent state
information from raw observations and subsequently plan efficiently. However,
such approaches can fail in the presence of temporally correlated noise in the
observations, a phenomenon that is common in practice. We initiate the formal
study of latent state discovery in the presence of such exogenous noise sources
by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich
observation RL. We start by establishing several negative results, by
highlighting failure cases of prior representation learning based approaches.
Then, we introduce the Predictive Path Elimination (PPE) algorithm, that learns
a generalization of inverse dynamics and is provably sample and computationally
efficient in EX-BMDPs when the endogenous state dynamics are near
deterministic. The sample complexity of PPE depends polynomially on the size of
the latent endogenous state space while not directly depending on the size of
the observation space, nor the exogenous state space. We provide experiments on
challenging exploration problems which show that our approach works
empirically.
Related papers
- Diffusion Spectral Representation for Reinforcement Learning [17.701625371409644]
We propose to leverage the flexibility of diffusion models for reinforcement learning from a representation learning perspective.
By exploiting the connection between diffusion models and energy-based models, we develop Diffusion Spectral Representation (Diff-SR)
We show how Diff-SR facilitates efficient policy optimization and practical algorithms while explicitly bypassing the difficulty and inference cost of sampling from the diffusion model.
arXiv Detail & Related papers (2024-06-23T14:24:14Z) - ODE-based Recurrent Model-free Reinforcement Learning for POMDPs [15.030970899252601]
We present a novel ODE-based recurrent model combines with model-free reinforcement learning framework to solve POMDPs.
We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks.
Our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
arXiv Detail & Related papers (2023-09-25T12:13:56Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - Causal Discovery from Conditionally Stationary Time Series [18.645887749731923]
State-Dependent Causal Inference (SDCI) is able to recover the underlying causal dependencies, provably with fully-observed states and empirically with hidden states.
improved results over non-causal RNNs on modeling NBA player movements demonstrate the potential of our method.
arXiv Detail & Related papers (2021-10-12T18:12:57Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.