Opportunistic Episodic Reinforcement Learning
- URL: http://arxiv.org/abs/2210.13504v1
- Date: Mon, 24 Oct 2022 18:02:33 GMT
- Title: Opportunistic Episodic Reinforcement Learning
- Authors: Xiaoxiao Wang, Nader Bouacida, Xueying Guo, Xin Liu
- Abstract summary: opportunistic reinforcement learning is a new variant of reinforcement learning problems where the regret of selecting a suboptimal action varies under an external environmental condition known as the variation factor.
Our intuition is to exploit more when the variation factor is high, and explore more when the variation factor is low.
Our algorithms balance the exploration-exploitation trade-off for reinforcement learning by introducing variation factor-dependent optimism to guide exploration.
- Score: 9.364712393700056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose and study opportunistic reinforcement learning - a
new variant of reinforcement learning problems where the regret of selecting a
suboptimal action varies under an external environmental condition known as the
variation factor. When the variation factor is low, so is the regret of
selecting a suboptimal action and vice versa. Our intuition is to exploit more
when the variation factor is high, and explore more when the variation factor
is low. We demonstrate the benefit of this novel framework for finite-horizon
episodic MDPs by designing and evaluating OppUCRL2 and OppPSRL algorithms. Our
algorithms dynamically balance the exploration-exploitation trade-off for
reinforcement learning by introducing variation factor-dependent optimism to
guide exploration. We establish an $\tilde{O}(HS \sqrt{AT})$ regret bound for
the OppUCRL2 algorithm and show through simulations that both OppUCRL2 and
OppPSRL algorithm outperform their original corresponding algorithms.
Related papers
- Dynamic deep-reinforcement-learning algorithm in Partially Observed
Markov Decision Processes [6.729108277517129]
This study shows the benefit of action sequence inclusion in order to solve Partially Observable Markov Decision Process.
The developed algorithms showed enhanced robustness of controller performance against different types of external disturbances.
arXiv Detail & Related papers (2023-07-29T08:52:35Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Reinforcement Learning for Non-Stationary Markov Decision Processes: The
Blessing of (More) Optimism [25.20231604057821]
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity.
We first develop the Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence Widening (SWUCRL2-CW) algorithm.
We propose the Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the SWUCRL2-CW algorithm to achieve the same dynamic regret bound.
arXiv Detail & Related papers (2020-06-24T15:40:21Z) - Towards Minimax Optimal Reinforcement Learning in Factored Markov
Decision Processes [53.72166325215299]
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs)
First one achieves minimax optimal regret guarantees for a rich class of factored structures.
Second one enjoys better computational complexity with a slightly worse regret.
arXiv Detail & Related papers (2020-06-24T00:50:17Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.