Oracle-Efficient Regret Minimization in Factored MDPs with Unknown
Structure
- URL: http://arxiv.org/abs/2009.05986v4
- Date: Mon, 11 Oct 2021 12:23:40 GMT
- Title: Oracle-Efficient Regret Minimization in Factored MDPs with Unknown
Structure
- Authors: Aviv Rosenberg and Yishay Mansour
- Abstract summary: We study regret in non-episodic factored Markov decision processes (FMDPs)
All existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance.
We provide the first algorithm that learns the structure of the FMDP while minimizing the regret.
- Score: 57.90236104782219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study regret minimization in non-episodic factored Markov decision
processes (FMDPs), where all existing algorithms make the strong assumption
that the factored structure of the FMDP is known to the learner in advance. In
this paper, we provide the first algorithm that learns the structure of the
FMDP while minimizing the regret. Our algorithm is based on the optimism in
face of uncertainty principle, combined with a simple statistical method for
structure learning, and can be implemented efficiently given oracle-access to
an FMDP planner. Moreover, we give a variant of our algorithm that remains
efficient even when the oracle is limited to non-factored actions, which is the
case with almost all existing approximate planners. Finally, we leverage our
techniques to prove a novel lower bound for the known structure case, closing
the gap to the regret bound of Chen et al. [2021].
Related papers
- Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - Iterative Depth-First Search for Fully Observable Non-Deterministic
Planning [25.2935633334145]
We develop a novel iterative depth-first search algorithm that solves FOND planning tasks and produces strong cyclic policies.
Our algorithm is explicitly designed for FOND planning, addressing more directly the non-deterministic aspect of FOND planning.
arXiv Detail & Related papers (2022-04-08T23:10:30Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Towards Minimax Optimal Reinforcement Learning in Factored Markov
Decision Processes [53.72166325215299]
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs)
First one achieves minimax optimal regret guarantees for a rich class of factored structures.
Second one enjoys better computational complexity with a slightly worse regret.
arXiv Detail & Related papers (2020-06-24T00:50:17Z) - Deep unfolding of the weighted MMSE beamforming algorithm [9.518010235273783]
We propose the novel application of deep unfolding to the WMMSE algorithm for a MISO downlink channel.
Deep unfolding naturally incorporates expert knowledge, with the benefits of immediate and well-grounded architecture selection, fewer trainable parameters, and better explainability.
By means of simulations, we show that, in most of the settings, the unfolded WMMSE outperforms or performs equally to the WMMSE for a fixed number of iterations.
arXiv Detail & Related papers (2020-06-15T14:51:20Z) - Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and
Tighter Regret Bounds for the Non-Episodic Setting [24.90164851620799]
We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs)
We propose two near-optimal and oracle-efficient algorithms for FMDPs.
Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
arXiv Detail & Related papers (2020-02-06T15:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.