Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation
- URL: http://arxiv.org/abs/2301.13087v1
- Date: Mon, 30 Jan 2023 17:26:39 GMT
- Title: Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation
- Authors: Uri Sherman, Tomer Koren, Yishay Mansour
- Abstract summary: We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
- Score: 69.0695698566235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study reinforcement learning with linear function approximation and
adversarially changing cost functions, a setup that has mostly been considered
under simplifying assumptions such as full information feedback or exploratory
conditions.We present a computationally efficient policy optimization algorithm
for the challenging general setting of unknown dynamics and bandit feedback,
featuring a combination of mirror-descent and least squares policy evaluation
in an auxiliary MDP used to compute exploration bonuses.Our algorithm obtains
an $\widetilde O(K^{6/7})$ regret bound, improving significantly over previous
state-of-the-art of $\widetilde O (K^{14/15})$ in this setting. In addition, we
present a version of the same algorithm under the assumption a simulator of the
environment is available to the learner (but otherwise no exploratory
assumptions are made), and prove it obtains state-of-the-art regret of
$\widetilde O (K^{2/3})$.
Related papers
- Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Average Reward Adjusted Discounted Reinforcement Learning:
Near-Blackwell-Optimal Policies for Real-World Applications [0.0]
Reinforcement learning aims at finding the best stationary policy for a given Markov Decision Process.
This paper provides deep theoretical insights to the widely applied standard discounted reinforcement learning framework.
We establish a novel near-Blackwell-optimal reinforcement learning algorithm.
arXiv Detail & Related papers (2020-04-02T08:05:18Z) - Adaptive Approximate Policy Iteration [22.915651391812187]
We present a learning scheme which enjoys a $tildeO(T2/3)$ regret bound for undiscounted, continuing learning in uniformly ergodic MDPs.
This is an improvement over the best existing bound of $tildeO(T3/4)$ for the average-reward case with function approximation.
arXiv Detail & Related papers (2020-02-08T02:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.