Oracle-Efficient Regret Minimization in Factored MDPs with Unknown
Structure
- URL: http://arxiv.org/abs/2009.05986v4
- Date: Mon, 11 Oct 2021 12:23:40 GMT
- Title: Oracle-Efficient Regret Minimization in Factored MDPs with Unknown
Structure
- Authors: Aviv Rosenberg and Yishay Mansour
- Abstract summary: We study regret in non-episodic factored Markov decision processes (FMDPs)
All existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance.
We provide the first algorithm that learns the structure of the FMDP while minimizing the regret.
- Score: 57.90236104782219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study regret minimization in non-episodic factored Markov decision
processes (FMDPs), where all existing algorithms make the strong assumption
that the factored structure of the FMDP is known to the learner in advance. In
this paper, we provide the first algorithm that learns the structure of the
FMDP while minimizing the regret. Our algorithm is based on the optimism in
face of uncertainty principle, combined with a simple statistical method for
structure learning, and can be implemented efficiently given oracle-access to
an FMDP planner. Moreover, we give a variant of our algorithm that remains
efficient even when the oracle is limited to non-factored actions, which is the
case with almost all existing approximate planners. Finally, we leverage our
techniques to prove a novel lower bound for the known structure case, closing
the gap to the regret bound of Chen et al. [2021].
Related papers
- On Sequential Fault-Intolerant Process Planning [60.66853798340345]
We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP)
SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed.
We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage.
arXiv Detail & Related papers (2025-02-07T15:20:35Z) - Alternating minimization for square root principal component pursuit [2.449191760736501]
We develop efficient algorithms for solving the square root principal component pursuit (SRPCP) problem.
Specifically, we propose a tuning-free alternating minimization (AltMin) algorithm, where each iteration involves subproblems enjoying closed-form optimal solutions.
We introduce techniques based on the variational formulation of the nuclear norm and Burer-Monteiro decomposition to further accelerate the AltMin method.
arXiv Detail & Related papers (2024-12-31T14:43:50Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Towards Minimax Optimal Reinforcement Learning in Factored Markov
Decision Processes [53.72166325215299]
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs)
First one achieves minimax optimal regret guarantees for a rich class of factored structures.
Second one enjoys better computational complexity with a slightly worse regret.
arXiv Detail & Related papers (2020-06-24T00:50:17Z) - Deep unfolding of the weighted MMSE beamforming algorithm [9.518010235273783]
We propose the novel application of deep unfolding to the WMMSE algorithm for a MISO downlink channel.
Deep unfolding naturally incorporates expert knowledge, with the benefits of immediate and well-grounded architecture selection, fewer trainable parameters, and better explainability.
By means of simulations, we show that, in most of the settings, the unfolded WMMSE outperforms or performs equally to the WMMSE for a fixed number of iterations.
arXiv Detail & Related papers (2020-06-15T14:51:20Z) - Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and
Tighter Regret Bounds for the Non-Episodic Setting [24.90164851620799]
We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs)
We propose two near-optimal and oracle-efficient algorithms for FMDPs.
Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
arXiv Detail & Related papers (2020-02-06T15:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.