Oracle-Efficient Regret Minimization in Factored MDPs with Unknown
  Structure
        - URL: http://arxiv.org/abs/2009.05986v4
- Date: Mon, 11 Oct 2021 12:23:40 GMT
- Title: Oracle-Efficient Regret Minimization in Factored MDPs with Unknown
  Structure
- Authors: Aviv Rosenberg and Yishay Mansour
- Abstract summary: We study regret in non-episodic factored Markov decision processes (FMDPs)
All existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance.
We provide the first algorithm that learns the structure of the FMDP while minimizing the regret.
- Score: 57.90236104782219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   We study regret minimization in non-episodic factored Markov decision
processes (FMDPs), where all existing algorithms make the strong assumption
that the factored structure of the FMDP is known to the learner in advance. In
this paper, we provide the first algorithm that learns the structure of the
FMDP while minimizing the regret. Our algorithm is based on the optimism in
face of uncertainty principle, combined with a simple statistical method for
structure learning, and can be implemented efficiently given oracle-access to
an FMDP planner. Moreover, we give a variant of our algorithm that remains
efficient even when the oracle is limited to non-factored actions, which is the
case with almost all existing approximate planners. Finally, we leverage our
techniques to prove a novel lower bound for the known structure case, closing
the gap to the regret bound of Chen et al. [2021].
 
      
        Related papers
        - On Sequential Fault-Intolerant Process Planning [60.66853798340345]
 We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP)
SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed.
We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage.
 arXiv  Detail & Related papers  (2025-02-07T15:20:35Z)
- Alternating minimization for square root principal component pursuit [2.449191760736501]
 We develop efficient algorithms for solving the square root principal component pursuit (SRPCP) problem.
Specifically, we propose a tuning-free alternating minimization (AltMin) algorithm, where each iteration involves subproblems enjoying closed-form optimal solutions.
We introduce techniques based on the variational formulation of the nuclear norm and Burer-Monteiro decomposition to further accelerate the AltMin method.
 arXiv  Detail & Related papers  (2024-12-31T14:43:50Z)
- Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
 Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
 arXiv  Detail & Related papers  (2023-07-08T15:41:48Z)
- Non-stationary Reinforcement Learning under General Function
  Approximation [60.430936031067006]
 We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
 arXiv  Detail & Related papers  (2023-06-01T16:19:37Z)
- Iterative Depth-First Search for Fully Observable Non-Deterministic
  Planning [25.2935633334145]
 We develop a novel iterative depth-first search algorithm that solves FOND planning tasks and produces strong cyclic policies.
Our algorithm is explicitly designed for FOND planning, addressing more directly the non-deterministic aspect of FOND planning.
 arXiv  Detail & Related papers  (2022-04-08T23:10:30Z)
- Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
 We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
 arXiv  Detail & Related papers  (2021-06-27T23:41:57Z)
- Adaptive Sampling for Best Policy Identification in Markov Decision
  Processes [79.4957965474334]
 We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
 arXiv  Detail & Related papers  (2020-09-28T15:22:24Z)
- Towards Minimax Optimal Reinforcement Learning in Factored Markov
  Decision Processes [53.72166325215299]
 We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs)
First one achieves minimax optimal regret guarantees for a rich class of factored structures.
Second one enjoys better computational complexity with a slightly worse regret.
 arXiv  Detail & Related papers  (2020-06-24T00:50:17Z)
- Deep unfolding of the weighted MMSE beamforming algorithm [9.518010235273783]
 We propose the novel application of deep unfolding to the WMMSE algorithm for a MISO downlink channel.
Deep unfolding naturally incorporates expert knowledge, with the benefits of immediate and well-grounded architecture selection, fewer trainable parameters, and better explainability.
By means of simulations, we show that, in most of the settings, the unfolded WMMSE outperforms or performs equally to the WMMSE for a fixed number of iterations.
 arXiv  Detail & Related papers  (2020-06-15T14:51:20Z)
- Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and
  Tighter Regret Bounds for the Non-Episodic Setting [24.90164851620799]
 We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs)
We propose two near-optimal and oracle-efficient algorithms for FMDPs.
Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
 arXiv  Detail & Related papers  (2020-02-06T15:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.