Related papers: Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

URL: http://arxiv.org/abs/2110.08847v1
Date: Sun, 17 Oct 2021 15:21:27 GMT
Title: Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
Authors: Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
Abstract summary: Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera. Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations. However, such approaches can fail in the presence of temporally correlated noise in the observations.
Score: 85.52408288789164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera. Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations and subsequently plan efficiently. However, such approaches can fail in the presence of temporally correlated noise in the observations, a phenomenon that is common in practice. We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL. We start by establishing several negative results, by highlighting failure cases of prior representation learning based approaches. Then, we introduce the Predictive Path Elimination (PPE) algorithm, that learns a generalization of inverse dynamics and is provably sample and computationally efficient in EX-BMDPs when the endogenous state dynamics are near deterministic. The sample complexity of PPE depends polynomially on the size of the latent endogenous state space while not directly depending on the size of the observation space, nor the exogenous state space. We provide experiments on challenging exploration problems which show that our approach works empirically.

Related papers

Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets [87.62730694973696]
This paper introduces CRAFT, a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations. We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example.
arXiv Detail & Related papers (2025-03-26T22:05:57Z)
Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization [15.898378661128334]
Reinforcement Learning (RL) algorithms are known to suffer from the curse of dimensionality. We propose overcoming the curse of dimensionality by approximately factorizing the original Markov decision processes (MDPs) into smaller, independently evolving MDPs. We provide improved sample complexity guarantees for both proposed algorithms.
arXiv Detail & Related papers (2024-11-12T07:08:00Z)
Diffusion Spectral Representation for Reinforcement Learning [17.701625371409644]
We propose to leverage the flexibility of diffusion models for reinforcement learning from a representation learning perspective. By exploiting the connection between diffusion models and energy-based models, we develop Diffusion Spectral Representation (Diff-SR) We show how Diff-SR facilitates efficient policy optimization and practical algorithms while explicitly bypassing the difficulty and inference cost of sampling from the diffusion model.
arXiv Detail & Related papers (2024-06-23T14:24:14Z)
ODE-based Recurrent Model-free Reinforcement Learning for POMDPs [15.030970899252601]
We present a novel ODE-based recurrent model combines with model-free reinforcement learning framework to solve POMDPs. We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks. Our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
arXiv Detail & Related papers (2023-09-25T12:13:56Z)
PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models. We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z)
Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data. We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism. We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z)
MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z)
Causal Discovery from Conditionally Stationary Time Series [18.645887749731923]
State-Dependent Causal Inference (SDCI) is able to recover the underlying causal dependencies, provably with fully-observed states and empirically with hidden states. improved results over non-causal RNNs on modeling NBA player movements demonstrate the potential of our method.
arXiv Detail & Related papers (2021-10-12T18:12:57Z)
Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel. On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z)
Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method. A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations. We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.