Related papers: Approximate information state for approximate planning and reinforcement learning in partially observed systems

Approximate information state for approximate planning and reinforcement learning in partially observed systems

URL: http://arxiv.org/abs/2010.08843v2
Date: Fri, 3 Sep 2021 18:54:23 GMT
Title: Approximate information state for approximate planning and reinforcement learning in partially observed systems
Authors: Jayakumar Subramanian, Amit Sinha, Raihan Seraj and Aditya Mahajan
Abstract summary: We show that if a function of the history (called approximate information state (AIS)) approximately satisfies the properties of the information state, then there is a corresponding approximate dynamic program. We show that several approximations in state, observation and action spaces in literature can be viewed as instances of AIS. A salient feature of AIS is that it can be learnt from data.
Score: 0.7646713951724009
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a theoretical framework for approximate planning and learning in partially observed systems. Our framework is based on the fundamental notion of information state. We provide two equivalent definitions of information state -- i) a function of history which is sufficient to compute the expected reward and predict its next value; ii) equivalently, a function of the history which can be recursively updated and is sufficient to compute the expected reward and predict the next observation. An information state always leads to a dynamic programming decomposition. Our key result is to show that if a function of the history (called approximate information state (AIS)) approximately satisfies the properties of the information state, then there is a corresponding approximate dynamic program. We show that the policy computed using this is approximately optimal with bounded loss of optimality. We show that several approximations in state, observation and action spaces in literature can be viewed as instances of AIS. In some of these cases, we obtain tighter bounds. A salient feature of AIS is that it can be learnt from data. We present AIS based multi-time scale policy gradient algorithms. and detailed numerical experiments with low, moderate and high dimensional environments.

Related papers

State Sequences Prediction via Fourier Transform for Representation Learning [111.82376793413746]
We propose State Sequences Prediction via Fourier Transform (SPF), a novel method for learning expressive representations efficiently. We theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
arXiv Detail & Related papers (2023-10-24T14:47:02Z)
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs) We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z)
Learning in POMDPs is Sample-Efficient with Hindsight Observability [36.66596305441365]
POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability. In many realistic problems, more information is either revealed or can be computed during some point of the learning process. We formulate a setting (setshort) as a POMDP where the latent states are revealed to the learner in hindsight and only during training.
arXiv Detail & Related papers (2023-01-31T18:54:36Z)
Approximate Information States for Worst-Case Control and Learning in Uncertain Systems [2.7282382992043885]
We consider a non-stochastic model, where disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by using the notion of the information state and approximate information state. We illustrate the application of our results in control and reinforcement learning using numerical examples.
arXiv Detail & Related papers (2023-01-12T15:36:36Z)
Graph state-space models [19.88814714919019]
State-space models are used to describe time series and operate by maintaining an updated representation of the system state from which predictions are made. The manuscript aims, for the first time, for the first time filling this gap by matching unattended state data where the functional graph capturing latent dependencies is learned directly from data and is allowed to change over time. An encoder-decoder architecture is proposed to learn the state-space model end-to-end on a downstream task.
arXiv Detail & Related papers (2023-01-04T18:15:07Z)
Task-Guided IRL in POMDPs that Scales [22.594913269327353]
In inverse linear reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. Most IRL techniques require the computationally forward problem -- computing an optimal policy given a reward function -- in POMDPs. We develop an algorithm that reduces the information while increasing the data efficiency.
arXiv Detail & Related papers (2022-12-30T21:08:57Z)
Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy. We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z)
Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z)
Robust and Adaptive Temporal-Difference Learning Using An Ensemble of Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning. The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs. To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z)
Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. We develop an approach for representation learning in RL that sits in between these two extremes. This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
Tractable Reinforcement Learning of Signal Temporal Logic Objectives [0.0]
Signal temporal logic (STL) is an expressive language to specify time-bound real-world robotic tasks and safety specifications. Learning to satisfy STL specifications often needs a sufficient length of state history to compute reward and the next action. We propose a compact means to capture state history in a new augmented state-space representation.
arXiv Detail & Related papers (2020-01-26T15:23:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.