Planning to the Information Horizon of BAMDPs via Epistemic State
Abstraction
- URL: http://arxiv.org/abs/2210.16872v1
- Date: Sun, 30 Oct 2022 16:30:23 GMT
- Title: Planning to the Information Horizon of BAMDPs via Epistemic State
Abstraction
- Authors: Dilip Arumugam, Satinder Singh
- Abstract summary: Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning.
Much of the literature has focused on developing suitable approximation algorithms.
We first define, under mild structural assumptions, a complexity measure for BAMDP planning.
We then conclude by introducing a specific form of state abstraction with the potential to reduce BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.
- Score: 27.33232096515561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the
Bayes-optimal solution to the exploration-exploitation trade-off in
reinforcement learning. As the computation of exact solutions to Bayesian
reinforcement-learning problems is intractable, much of the literature has
focused on developing suitable approximation algorithms. In this work, before
diving into algorithm design, we first define, under mild structural
assumptions, a complexity measure for BAMDP planning. As efficient exploration
in BAMDPs hinges upon the judicious acquisition of information, our complexity
measure highlights the worst-case difficulty of gathering information and
exhausting epistemic uncertainty. To illustrate its significance, we establish
a computationally-intractable, exact planning algorithm that takes advantage of
this measure to show more efficient planning. We then conclude by introducing a
specific form of state abstraction with the potential to reduce BAMDP
complexity and gives rise to a computationally-tractable, approximate planning
algorithm.
Related papers
- Depth-Bounded Epistemic Planning [50.42592219248395]
We present a novel algorithm for planning based on dynamic epistemic logic.
The novelty is that we limit the depth of reasoning of the planning agent to an upper bound b.
We show it to be complete with respect to planning tasks having a solution within bound b of reasoning depth.
arXiv Detail & Related papers (2024-06-03T09:30:28Z) - An Improved Artificial Fish Swarm Algorithm for Solving the Problem of
Investigation Path Planning [8.725702964289479]
We propose a chaotic artificial fish swarm algorithm based on multiple population differential evolution (DE-CAFSA)
We incorporate adaptive field of view and step size adjustments, replace random behavior with the 2-opt operation, and introduce chaos theory and sub-optimal solutions.
Experimental results demonstrate that DE-CAFSA outperforms other algorithms on various public datasets of different sizes.
arXiv Detail & Related papers (2023-10-20T09:35:51Z) - On efficient computation in active inference [1.1470070927586016]
We present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity.
We also simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes.
arXiv Detail & Related papers (2023-07-02T07:38:56Z) - Simple Steps to Success: Axiomatics of Distance-Based Algorithmic
Recourse [13.207786673115296]
We present Stepwise Explainable Paths (StEP), an axiomatically justified framework to compute direction-based algorithmic recourse.
StEP offers provable privacy and robustness guarantees, and outperforms the state-of-the-art on several established recourse desiderata.
arXiv Detail & Related papers (2023-06-27T15:35:22Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Learning to Optimize Resource Assignment for Task Offloading in Mobile
Edge Computing [35.69975917554333]
We propose an intelligent BnB (IBnB) approach which applies deep learning (DL) to learn the pruning strategy of the BnB approach.
By using this learning scheme, the structure of the BnB approach ensures near-optimal performance and DL-based pruning strategy significantly reduces the complexity.
arXiv Detail & Related papers (2022-03-15T10:17:29Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - A Unifying Framework for Reinforcement Learning and Planning [2.564530030795554]
This paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP)
At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions.
arXiv Detail & Related papers (2020-06-26T14:30:41Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.