Intermittently Observable Markov Decision Processes
- URL: http://arxiv.org/abs/2302.11761v1
- Date: Thu, 23 Feb 2023 03:38:03 GMT
- Title: Intermittently Observable Markov Decision Processes
- Authors: Gongpu Chen and Soung-Chang Liew
- Abstract summary: We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel.
The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process.
We develop two finite-state approximations to the tree MDP to find near-optimal policies efficiently.
- Score: 26.118176084782842
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates MDPs with intermittent state information. We consider
a scenario where the controller perceives the state information of the process
via an unreliable communication channel. The transmissions of state information
over the whole time horizon are modeled as a Bernoulli lossy process. Hence,
the problem is finding an optimal policy for selecting actions in the presence
of state information losses. We first formulate the problem as a belief MDP to
establish structural results. The effect of state information losses on the
expected total discounted reward is studied systematically. Then, we
reformulate the problem as a tree MDP whose state space is organized in a tree
structure. Two finite-state approximations to the tree MDP are developed to
find near-optimal policies efficiently. Finally, we put forth a nested value
iteration algorithm for the finite-state approximations, which is proved to be
faster than standard value iteration. Numerical results demonstrate the
effectiveness of our methods.
Related papers
- Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement
Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems.
We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics.
We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z) - State Sequences Prediction via Fourier Transform for Representation
Learning [111.82376793413746]
We propose State Sequences Prediction via Fourier Transform (SPF), a novel method for learning expressive representations efficiently.
We theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity.
Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
arXiv Detail & Related papers (2023-10-24T14:47:02Z) - Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs)
Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks.
We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z) - Faster Approximate Dynamic Programming by Freezing Slow States [5.6928413790238865]
We consider infinite horizon Markov decision processes (MDPs) with fast-slow structure.
Such structure is common in real-world problems where sequential decisions need to be made at high frequencies.
We propose an approximate dynamic programming framework based on the idea of "freezing" the slow states.
arXiv Detail & Related papers (2023-01-03T01:35:24Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - SIDE: I Infer the State I Want to Learn [17.993973801986677]
We propose a novel value decomposition framework, named State Inference for value DEcomposition (SIDE), which eliminates the need to know the true state.
SIDE can be extended to any value decomposition method, as well as other types of multi-agent algorithms in the case of Dec-POMDP.
arXiv Detail & Related papers (2021-05-13T12:26:02Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Point-Based Methods for Model Checking in Partially Observable Markov
Decision Processes [36.07746952116073]
We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP)
We show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula.
We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.
arXiv Detail & Related papers (2020-01-11T23:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.