Related papers: Using Forwards-Backwards Models to Approximate MDP Homomorphisms

Using Forwards-Backwards Models to Approximate MDP Homomorphisms

URL: http://arxiv.org/abs/2209.06356v3
Date: Sat, 2 Mar 2024 17:02:40 GMT
Title: Using Forwards-Backwards Models to Approximate MDP Homomorphisms
Authors: Augustine N. Mavor-Parker, Matthew J. Sargent, Christian Pehle, Andrea Banino, Lewis D. Griffin, Caswell Barry
Abstract summary: We propose a novel approach to constructing homomorphisms in discrete action spaces. We use a learnt model of environment dynamics to infer which state-action pairs lead to the same state. In MinAtar, we report an almost 4x improvement over a value-based off-policy baseline in the low sample limit.
Score: 11.020094184644789
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning agents must painstakingly learn through trial and error what sets of state-action pairs are value equivalent -- requiring an often prohibitively large amount of environment experience. MDP homomorphisms have been proposed that reduce the MDP of an environment to an abstract MDP, enabling better sample efficiency. Consequently, impressive improvements have been achieved when a suitable homomorphism can be constructed a priori -- usually by exploiting a practitioner's knowledge of environment symmetries. We propose a novel approach to constructing homomorphisms in discrete action spaces, which uses a learnt model of environment dynamics to infer which state-action pairs lead to the same state -- which can reduce the size of the state-action space by a factor as large as the cardinality of the original action space. In MinAtar, we report an almost 4x improvement over a value-based off-policy baseline in the low sample limit, when averaging over all games and optimizers.

Related papers

Monotone Optimisation with Learned Projections [0.0]
Monotone optimisation problems admit specialised global solvers such as the Polyblock Outer Approximation (POA) algorithm.<n>We introduce an algorithm-aware learning approach that integrates learned models into POA by directly predicting its projection primitive via the radial inverse.
arXiv Detail & Related papers (2026-01-28T19:32:04Z)
Heterogeneous User Modeling for LLM-based Recommendation [70.52873882470328]
A key challenge to advancing open-domain recommendation is effectively modeling user preferences from users' heterogeneous behaviors.<n>Existing approaches, including ID-based and semantic-based modeling, struggle with poor generalization.<n>We propose a Heterogeneous User Modeling (HUM) method, which incorporates a compression enhancer and a robustness enhancer for LLM-based recommendation.
arXiv Detail & Related papers (2025-07-07T03:08:28Z)
Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes [5.220940151628735]
This work introduces a novel bi-recurrent model architecture that improves sample efficiency and reduces model parameter count in POMDP scenarios.<n>The proposed model architecture outperforms existing transformer-based, attention-based, and recurrence-based methods by a margin ranging from 87.39% to 482.04% on average.
arXiv Detail & Related papers (2025-05-16T11:54:48Z)
Towards Causal Model-Based Policy Optimization [0.24578723416255752]
We introduce Causal Model-Based Policy Optimization (C-MBPO) C-MBPO is a novel framework that integrates causal learning into the Model-Based Reinforcement Learning pipeline. We show that C-MBPO can be shown to be robust to a class of distributional shifts that affect spurious, non-causal relationships in the dynamics.
arXiv Detail & Related papers (2025-03-12T18:09:02Z)
A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models [6.647819824559201]
We study the large-sample properties of a likelihood-based approach for estimating conditional deep generative models. Our results lead to the convergence rate of a sieve maximum likelihood estimator for estimating the conditional distribution.
arXiv Detail & Related papers (2024-10-02T20:46:21Z)
Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. CMDPs serve as an important framework to model many real-world applications with time-varying environments. We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z)
Boosting Adversarial Transferability by Achieving Flat Local Maxima [23.91315978193527]
Recently, various adversarial attacks have emerged to boost adversarial transferability from different perspectives. In this work, we assume and empirically validate that adversarial examples at a flat local region tend to have good transferability. We propose an approximation optimization method to simplify the gradient update of the objective function.
arXiv Detail & Related papers (2023-06-08T14:21:02Z)
Domain-Specific Risk Minimization for Out-of-Distribution Generalization [104.17683265084757]
We first establish a generalization bound that explicitly considers the adaptivity gap. We propose effective gap estimation methods for guiding the selection of a better hypothesis for the target. The other method is minimizing the gap directly by adapting model parameters using online target samples.
arXiv Detail & Related papers (2022-08-18T06:42:49Z)
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera. Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations. However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z)
Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects [83.66276516095665]
Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals. Typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable. We propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function.
arXiv Detail & Related papers (2021-08-06T10:39:49Z)
Posterior-Aided Regularization for Likelihood-Free Inference [23.708122045184698]
Posterior-Aided Regularization (PAR) is applicable to learning the density estimator, regardless of the model structure. We provide a unified estimation method of PAR to estimate both reverse KL term and mutual information term with a single neural network.
arXiv Detail & Related papers (2021-02-15T16:59:30Z)
Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
Plannable Approximations to MDP Homomorphisms: Equivariance under Actions [72.30921397899684]
We introduce a contrastive loss function that enforces action equivariance on the learned representations. We prove that when our loss is zero, we have a homomorphism of a deterministic Markov Decision Process. We show experimentally that for deterministic MDPs, the optimal policy in the abstract MDP can be successfully lifted to the original MDP.
arXiv Detail & Related papers (2020-02-27T08:29:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.