Markov Decision Processes with Noisy State Observation
- URL: http://arxiv.org/abs/2312.08536v1
- Date: Wed, 13 Dec 2023 21:50:38 GMT
- Title: Markov Decision Processes with Noisy State Observation
- Authors: Amirhossein Afsharrad, Sanjay Lall
- Abstract summary: This paper addresses the challenge of a particular class of noisy state observations in Markov Decision Processes (MDPs)
We focus on modeling this uncertainty through a confusion matrix that captures the probabilities of misidentifying the true state.
We propose two novel algorithmic approaches to estimate the inherent measurement noise.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the challenge of a particular class of noisy state
observations in Markov Decision Processes (MDPs), a common issue in various
real-world applications. We focus on modeling this uncertainty through a
confusion matrix that captures the probabilities of misidentifying the true
state. Our primary goal is to estimate the inherent measurement noise, and to
this end, we propose two novel algorithmic approaches. The first, the method of
second-order repetitive actions, is designed for efficient noise estimation
within a finite time window, providing identifiable conditions for system
analysis. The second approach comprises a family of Bayesian algorithms, which
we thoroughly analyze and compare in terms of performance and limitations. We
substantiate our theoretical findings with simulations, demonstrating the
effectiveness of our methods in different scenarios, particularly highlighting
their behavior in environments with varying stationary distributions. Our work
advances the understanding of reinforcement learning in noisy environments,
offering robust techniques for more accurate state estimation in MDPs.
Related papers
- Rethinking State Disentanglement in Causal Reinforcement Learning [78.12976579620165]
Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability.
We revisit this research line and find that incorporating RL-specific context can reduce unnecessary assumptions in previous identifiability analyses for latent states.
We propose a novel approach for general partially observable Markov Decision Processes (POMDPs) by replacing the complicated structural constraints in previous methods with two simple constraints for transition and reward preservation.
arXiv Detail & Related papers (2024-08-24T06:49:13Z) - Unraveling Rodeo Algorithm Through the Zeeman Model [0.0]
We unravel the Rodeo Algorithm to determine the eigenstates and eigenvalues spectrum for a general Hamiltonian considering arbitrary initial states.
We exploit Pennylane and Qiskit platforms resources to analyze scenarios where the Hamiltonians are described by the Zeeman model for one and two spins.
arXiv Detail & Related papers (2024-07-16T01:29:25Z) - Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives [16.101435842520473]
This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem.
Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP.
We present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space.
arXiv Detail & Related papers (2024-06-05T02:33:50Z) - Anomaly Detection via Learning-Based Sequential Controlled Sensing [25.282033825977827]
We address the problem of detecting anomalies among a set of binary processes via learning-based controlled sensing.
To identify the anomalies, the decision-making agent is allowed to observe a subset of the processes at each time instant.
Our objective is to design a sequential selection policy that dynamically determines which processes to observe at each time.
arXiv Detail & Related papers (2023-11-30T07:49:33Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Online Multi-Agent Decentralized Byzantine-robust Gradient Estimation [62.997667081978825]
Our algorithm is based on simultaneous perturbation, secure state estimation and two-timescale approximations.
We also show the performance of our algorithm through numerical experiments.
arXiv Detail & Related papers (2022-09-30T07:29:49Z) - Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian
Noise [59.47042225257565]
We present a novel planning method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states.
We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP)
arXiv Detail & Related papers (2021-10-25T06:18:55Z) - Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment
Restriction [39.51144507601913]
We focus on the proximal causal learning setting, but our methods can be used to solve a wider class of inverse problems characterised by a Fredholm integral equation.
We provide consistency guarantees for each algorithm, and we demonstrate these approaches achieve competitive results on synthetic data and data simulating a real-world task.
arXiv Detail & Related papers (2021-05-10T17:52:48Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Spatio-temporal Sequence Prediction with Point Processes and
Self-organizing Decision Trees [0.0]
We study the partitioning-temporal prediction problem introduce a point-process-based prediction algorithm.
Our algorithm can jointly learn the spatial event and the interaction between these regions through a gradient-based optimization procedure.
We compare our approach with state-of-the-art deep learning-based approaches, where we achieve significant performance improvements.
arXiv Detail & Related papers (2020-06-25T14:04:55Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.