Reinforcement Learning under Partial Observability Guided by Learned
Environment Models
- URL: http://arxiv.org/abs/2206.11708v1
- Date: Thu, 23 Jun 2022 13:55:13 GMT
- Title: Reinforcement Learning under Partial Observability Guided by Learned
Environment Models
- Authors: Edi Muskardin, Martin Tappler, Bernhard K. Aichernig, Ingo Pill
- Abstract summary: We propose an approach for reinforcement learning (RL) in partially observable environments.
Our approach combines Q-learning with IoAlergia, a method for learning Markov decision processes.
We report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques.
- Score: 1.1470070927586016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In practical applications, we can rarely assume full observability of a
system's environment, despite such knowledge being important for determining a
reactive control system's precise interaction with its environment. Therefore,
we propose an approach for reinforcement learning (RL) in partially observable
environments. While assuming that the environment behaves like a partially
observable Markov decision process with known discrete actions, we assume no
knowledge about its structure or transition probabilities.
Our approach combines Q-learning with IoAlergia, a method for learning Markov
decision processes (MDP). By learning MDP models of the environment from
episodes of the RL agent, we enable RL in partially observable domains without
explicit, additional memory to track previous interactions for dealing with
ambiguities stemming from partial observability. We instead provide RL with
additional observations in the form of abstract environment states by
simulating new experiences on learned environment models to track the explored
states. In our evaluation, we report on the validity of our approach and its
promising performance in comparison to six state-of-the-art deep RL techniques
with recurrent neural networks and fixed memory.
Related papers
- OCMDP: Observation-Constrained Markov Decision Process [9.13947446878397]
We tackle the challenge of simultaneously learning observation and control strategies in cost-sensitive environments.
We develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy.
We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole.
arXiv Detail & Related papers (2024-11-11T16:04:49Z) - ODE-based Recurrent Model-free Reinforcement Learning for POMDPs [15.030970899252601]
We present a novel ODE-based recurrent model combines with model-free reinforcement learning framework to solve POMDPs.
We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks.
Our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
arXiv Detail & Related papers (2023-09-25T12:13:56Z) - Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z) - Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent.
In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics.
We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability [19.56438470022024]
Many real-world problems involve partial observations, formulated as partially observable decision processes (POMDPs)
Previous studies have tackled RL in POMDPs by either incorporating the memory of past actions and observations or by inferring the true state of the environment.
We propose a unified principle that establishes a theoretical connection between Active inference (AIF) andReinforcement learning (RL)
Experimental results demonstrate the superior learning capabilities of our method in solving continuous space partially observable tasks.
arXiv Detail & Related papers (2022-12-15T16:28:06Z) - Provably Efficient Reinforcement Learning in Partially Observable
Dynamical Systems [97.12538243736705]
We study Reinforcement Learning for partially observable dynamical systems using function approximation.
We propose a new textitPartially Observable Bilinear Actor-Critic framework, that is general enough to include models such as POMDPs, observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition.
arXiv Detail & Related papers (2022-06-24T00:27:42Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.