Active Model Estimation in Markov Decision Processes
- URL: http://arxiv.org/abs/2003.03297v2
- Date: Mon, 22 Jun 2020 20:39:45 GMT
- Title: Active Model Estimation in Markov Decision Processes
- Authors: Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad
Ghavamzadeh, Alessandro Lazaric
- Abstract summary: We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
- Score: 108.46146218973189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of efficient exploration in order to learn an accurate
model of an environment, modeled as a Markov decision process (MDP). Efficient
exploration in this problem requires the agent to identify the regions in which
estimating the model is more difficult and then exploit this knowledge to
collect more samples there. In this paper, we formalize this problem, introduce
the first algorithm to learn an $\epsilon$-accurate estimate of the dynamics,
and provide its sample complexity analysis. While this algorithm enjoys strong
guarantees in the large-sample regime, it tends to have a poor performance in
early stages of exploration. To address this issue, we propose an algorithm
that is based on maximum weighted entropy, a heuristic that stems from common
sense and our theoretical analysis. The main idea here is to cover the entire
state-action space with the weight proportional to the noise in the
transitions. Using a number of simple domains with heterogeneous noise in their
transitions, we show that our heuristic-based algorithm outperforms both our
original algorithm and the maximum entropy algorithm in the small sample
regime, while achieving similar asymptotic performance as that of the original
algorithm.
Related papers
- Geometry-Aware Approaches for Balancing Performance and Theoretical
Guarantees in Linear Bandits [6.907555940790131]
Thompson sampling and Greedy demonstrate promising empirical performance, yet this contrasts with their pessimistic theoretical regret bounds.
We propose a new data-driven technique that tracks the geometric properties of the uncertainty ellipsoid.
We identify and course-correct" problem instances in which the base algorithms perform poorly.
arXiv Detail & Related papers (2023-06-26T17:38:45Z) - High-dimensional Contextual Bandit Problem without Sparsity [8.782204980889077]
We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance.
We derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation.
We introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance.
arXiv Detail & Related papers (2023-06-19T15:29:32Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Optimal Algorithms for the Inhomogeneous Spiked Wigner Model [89.1371983413931]
We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem.
We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random.
arXiv Detail & Related papers (2023-02-13T19:57:17Z) - Robust recovery for stochastic block models [16.74630355427558]
We develop an efficient algorithm for weak recovery in a robust version of the block model.
Our results show that there is no price of robustness in the block model.
arXiv Detail & Related papers (2021-11-16T15:43:00Z) - A proximal-proximal majorization-minimization algorithm for nonconvex
tuning-free robust regression problems [4.261680642170457]
We introduce proximal-proximal majorization-minimization (PPMM) algorithm for non regression problems.
Our proposed algorithm outperforms the existing state-of-the-art algorithms.
arXiv Detail & Related papers (2021-06-25T15:07:13Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.