Towards Minimax Optimal Reinforcement Learning in Factored Markov
Decision Processes
- URL: http://arxiv.org/abs/2006.13405v1
- Date: Wed, 24 Jun 2020 00:50:17 GMT
- Title: Towards Minimax Optimal Reinforcement Learning in Factored Markov
Decision Processes
- Authors: Yi Tian, Jian Qian, Suvrit Sra
- Abstract summary: We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs)
First one achieves minimax optimal regret guarantees for a rich class of factored structures.
Second one enjoys better computational complexity with a slightly worse regret.
- Score: 53.72166325215299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study minimax optimal reinforcement learning in episodic factored Markov
decision processes (FMDPs), which are MDPs with conditionally independent
transition components. Assuming the factorization is known, we propose two
model-based algorithms. The first one achieves minimax optimal regret
guarantees for a rich class of factored structures, while the second one enjoys
better computational complexity with a slightly worse regret. A key new
ingredient of our algorithms is the design of a bonus term to guide
exploration. We complement our algorithms by presenting several
structure-dependent lower bounds on regret for FMDPs that reveal the difficulty
hiding in the intricacy of the structures.
Related papers
- Regret Minimization via Saddle Point Optimization [29.78262192683203]
Decision-estimation coefficient (DEC) was shown to provide nearly tight lower and upper bounds on the worst-case expected regret in structured bandits and reinforcement learning.
We derive an anytime variant of the Estimation-To-Decisions (E2D) algorithm.
Our formulation leads to a practical algorithm for finite model classes and linear feedback models.
arXiv Detail & Related papers (2024-03-15T15:09:13Z) - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs)
We propose two new environment norms to characterize the fine-grained variance properties of the environment.
For model-based methods, we design a variant of the MVP algorithm.
In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z) - Opportunistic Episodic Reinforcement Learning [9.364712393700056]
opportunistic reinforcement learning is a new variant of reinforcement learning problems where the regret of selecting a suboptimal action varies under an external environmental condition known as the variation factor.
Our intuition is to exploit more when the variation factor is high, and explore more when the variation factor is low.
Our algorithms balance the exploration-exploitation trade-off for reinforcement learning by introducing variation factor-dependent optimism to guide exploration.
arXiv Detail & Related papers (2022-10-24T18:02:33Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Learning the Markov Decision Process in the Sparse Gaussian Elimination [0.0]
We propose a learning-based approach for the sparse Gaussian Elimination.
We proposed some Q-Learning algorithms for the main modules of sparse solver.
arXiv Detail & Related papers (2021-09-30T08:56:39Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Oracle-Efficient Regret Minimization in Factored MDPs with Unknown
Structure [57.90236104782219]
We study regret in non-episodic factored Markov decision processes (FMDPs)
All existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance.
We provide the first algorithm that learns the structure of the FMDP while minimizing the regret.
arXiv Detail & Related papers (2020-09-13T12:30:35Z) - Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and
Tighter Regret Bounds for the Non-Episodic Setting [24.90164851620799]
We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs)
We propose two near-optimal and oracle-efficient algorithms for FMDPs.
Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
arXiv Detail & Related papers (2020-02-06T15:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.