Using Forwards-Backwards Models to Approximate MDP Homomorphisms
- URL: http://arxiv.org/abs/2209.06356v3
- Date: Sat, 2 Mar 2024 17:02:40 GMT
- Title: Using Forwards-Backwards Models to Approximate MDP Homomorphisms
- Authors: Augustine N. Mavor-Parker, Matthew J. Sargent, Christian Pehle, Andrea
Banino, Lewis D. Griffin, Caswell Barry
- Abstract summary: We propose a novel approach to constructing homomorphisms in discrete action spaces.
We use a learnt model of environment dynamics to infer which state-action pairs lead to the same state.
In MinAtar, we report an almost 4x improvement over a value-based off-policy baseline in the low sample limit.
- Score: 11.020094184644789
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning agents must painstakingly learn through trial and
error what sets of state-action pairs are value equivalent -- requiring an
often prohibitively large amount of environment experience. MDP homomorphisms
have been proposed that reduce the MDP of an environment to an abstract MDP,
enabling better sample efficiency. Consequently, impressive improvements have
been achieved when a suitable homomorphism can be constructed a priori --
usually by exploiting a practitioner's knowledge of environment symmetries. We
propose a novel approach to constructing homomorphisms in discrete action
spaces, which uses a learnt model of environment dynamics to infer which
state-action pairs lead to the same state -- which can reduce the size of the
state-action space by a factor as large as the cardinality of the original
action space. In MinAtar, we report an almost 4x improvement over a value-based
off-policy baseline in the low sample limit, when averaging over all games and
optimizers.
Related papers
- A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models [6.647819824559201]
We study the large-sample properties of a likelihood-based approach for estimating conditional deep generative models.
Our results lead to the convergence rate of a sieve maximum likelihood estimator for estimating the conditional distribution.
arXiv Detail & Related papers (2024-10-02T20:46:21Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Boosting Adversarial Transferability by Achieving Flat Local Maxima [23.91315978193527]
Recently, various adversarial attacks have emerged to boost adversarial transferability from different perspectives.
In this work, we assume and empirically validate that adversarial examples at a flat local region tend to have good transferability.
We propose an approximation optimization method to simplify the gradient update of the objective function.
arXiv Detail & Related papers (2023-06-08T14:21:02Z) - Domain-Specific Risk Minimization for Out-of-Distribution Generalization [104.17683265084757]
We first establish a generalization bound that explicitly considers the adaptivity gap.
We propose effective gap estimation methods for guiding the selection of a better hypothesis for the target.
The other method is minimizing the gap directly by adapting model parameters using online target samples.
arXiv Detail & Related papers (2022-08-18T06:42:49Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Identifiable Energy-based Representations: An Application to Estimating
Heterogeneous Causal Effects [83.66276516095665]
Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals.
Typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable.
We propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function.
arXiv Detail & Related papers (2021-08-06T10:39:49Z) - Posterior-Aided Regularization for Likelihood-Free Inference [23.708122045184698]
Posterior-Aided Regularization (PAR) is applicable to learning the density estimator, regardless of the model structure.
We provide a unified estimation method of PAR to estimate both reverse KL term and mutual information term with a single neural network.
arXiv Detail & Related papers (2021-02-15T16:59:30Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z) - Plannable Approximations to MDP Homomorphisms: Equivariance under
Actions [72.30921397899684]
We introduce a contrastive loss function that enforces action equivariance on the learned representations.
We prove that when our loss is zero, we have a homomorphism of a deterministic Markov Decision Process.
We show experimentally that for deterministic MDPs, the optimal policy in the abstract MDP can be successfully lifted to the original MDP.
arXiv Detail & Related papers (2020-02-27T08:29:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.