Related papers: Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure

Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure

URL: http://arxiv.org/abs/2009.05986v4
Date: Mon, 11 Oct 2021 12:23:40 GMT
Title: Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure
Authors: Aviv Rosenberg and Yishay Mansour
Abstract summary: We study regret in non-episodic factored Markov decision processes (FMDPs) All existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance. We provide the first algorithm that learns the structure of the FMDP while minimizing the regret.
Score: 57.90236104782219
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first algorithm that learns the structure of the FMDP while minimizing the regret. Our algorithm is based on the optimism in face of uncertainty principle, combined with a simple statistical method for structure learning, and can be implemented efficiently given oracle-access to an FMDP planner. Moreover, we give a variant of our algorithm that remains efficient even when the oracle is limited to non-factored actions, which is the case with almost all existing approximate planners. Finally, we leverage our techniques to prove a novel lower bound for the known structure case, closing the gap to the regret bound of Chen et al. [2021].

Related papers

On Sequential Fault-Intolerant Process Planning [60.66853798340345]
We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP) SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage.
arXiv Detail & Related papers (2025-02-07T15:20:35Z)
Alternating minimization for square root principal component pursuit [2.449191760736501]
We develop efficient algorithms for solving the square root principal component pursuit (SRPCP) problem. Specifically, we propose a tuning-free alternating minimization (AltMin) algorithm, where each iteration involves subproblems enjoying closed-form optimal solutions. We introduce techniques based on the variational formulation of the nuclear norm and Burer-Monteiro decomposition to further accelerate the AltMin method.
arXiv Detail & Related papers (2024-12-31T14:43:50Z)
Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation. Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions. We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z)
Non-stationary Reinforcement Learning under General Function Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA. We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z)
Iterative Depth-First Search for Fully Observable Non-Deterministic Planning [25.2935633334145]
We develop a novel iterative depth-first search algorithm that solves FOND planning tasks and produces strong cyclic policies. Our algorithm is explicitly designed for FOND planning, addressing more directly the non-deterministic aspect of FOND planning.
arXiv Detail & Related papers (2022-04-08T23:10:30Z)
Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes [53.72166325215299]
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs) First one achieves minimax optimal regret guarantees for a rich class of factored structures. Second one enjoys better computational complexity with a slightly worse regret.
arXiv Detail & Related papers (2020-06-24T00:50:17Z)
Deep unfolding of the weighted MMSE beamforming algorithm [9.518010235273783]
We propose the novel application of deep unfolding to the WMMSE algorithm for a MISO downlink channel. Deep unfolding naturally incorporates expert knowledge, with the benefits of immediate and well-grounded architecture selection, fewer trainable parameters, and better explainability. By means of simulations, we show that, in most of the settings, the unfolded WMMSE outperforms or performs equally to the WMMSE for a fixed number of iterations.
arXiv Detail & Related papers (2020-06-15T14:51:20Z)
Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting [24.90164851620799]
We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs) We propose two near-optimal and oracle-efficient algorithms for FMDPs. Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
arXiv Detail & Related papers (2020-02-06T15:19:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.