Related papers: Reinforcement Learning under Partial Observability Guided by Learned Environment Models

Reinforcement Learning under Partial Observability Guided by Learned Environment Models

URL: http://arxiv.org/abs/2206.11708v1
Date: Thu, 23 Jun 2022 13:55:13 GMT
Title: Reinforcement Learning under Partial Observability Guided by Learned Environment Models
Authors: Edi Muskardin, Martin Tappler, Bernhard K. Aichernig, Ingo Pill
Abstract summary: We propose an approach for reinforcement learning (RL) in partially observable environments. Our approach combines Q-learning with IoAlergia, a method for learning Markov decision processes. We report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques.
Score: 1.1470070927586016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In practical applications, we can rarely assume full observability of a system's environment, despite such knowledge being important for determining a reactive control system's precise interaction with its environment. Therefore, we propose an approach for reinforcement learning (RL) in partially observable environments. While assuming that the environment behaves like a partially observable Markov decision process with known discrete actions, we assume no knowledge about its structure or transition probabilities. Our approach combines Q-learning with IoAlergia, a method for learning Markov decision processes (MDP). By learning MDP models of the environment from episodes of the RL agent, we enable RL in partially observable domains without explicit, additional memory to track previous interactions for dealing with ambiguities stemming from partial observability. We instead provide RL with additional observations in the form of abstract environment states by simulating new experiences on learned environment models to track the explored states. In our evaluation, we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory.

Related papers

Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets [87.62730694973696]
This paper introduces CRAFT, a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations. We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example.
arXiv Detail & Related papers (2025-03-26T22:05:57Z)
OCMDP: Observation-Constrained Markov Decision Process [9.13947446878397]
We tackle the challenge of simultaneously learning observation and control strategies in cost-sensitive environments. We develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy. We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole.
arXiv Detail & Related papers (2024-11-11T16:04:49Z)
ODE-based Recurrent Model-free Reinforcement Learning for POMDPs [15.030970899252601]
We present a novel ODE-based recurrent model combines with model-free reinforcement learning framework to solve POMDPs. We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks. Our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
arXiv Detail & Related papers (2023-09-25T12:13:56Z)
Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework. We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z)
Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent. In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics. We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability [19.56438470022024]
Many real-world problems involve partial observations, formulated as partially observable decision processes (POMDPs) Previous studies have tackled RL in POMDPs by either incorporating the memory of past actions and observations or by inferring the true state of the environment. We propose a unified principle that establishes a theoretical connection between Active inference (AIF) andReinforcement learning (RL) Experimental results demonstrate the superior learning capabilities of our method in solving continuous space partially observable tasks.
arXiv Detail & Related papers (2022-12-15T16:28:06Z)
Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems [97.12538243736705]
We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new textitPartially Observable Bilinear Actor-Critic framework, that is general enough to include models such as POMDPs, observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition.
arXiv Detail & Related papers (2022-06-24T00:27:42Z)
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment. We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z)
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera. Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations. However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z)
Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel. On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z)
Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.