Related papers: Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples

Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples

URL: http://arxiv.org/abs/2106.14642v5
Date: Tue, 25 Jun 2024 07:08:34 GMT
Title: Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples
Authors: Li Meng, Anis Yazidi, Morten Goodwin, Paal Engelstad,
Abstract summary: Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning. An offline expert assesses the value of a state in a coarse manner using three discrete values. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias.
Score: 8.938418994111716
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. We require that an offline expert assesses the value of a state in a coarse manner using three discrete values. An expert network is designed in addition to the Q-network, which updates each time following the regular offline minibatch update whenever the expert example buffer is not empty. Using the board game Othello, we compare our algorithm with the baseline Q-learning algorithm, which is a combination of Double Q-learning and Dueling Q-learning. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias. The baseline Q-learning algorithm exhibits unstable and suboptimal behavior in non-deterministic settings, whereas Expert Q-learning demonstrates more robust performance with higher scores, illustrating that our algorithm is indeed suitable to integrate state values from expert examples into Q-learning.

Related papers

Lifting the Veil: Unlocking the Power of Depth in Q-learning [31.700583180829106]
deep Q-learning has been widely used in operations research and management science. This paper theoretically verifies the power of depth in deep Q-learning.
arXiv Detail & Related papers (2023-10-27T06:15:33Z)
Suppressing Overestimation in Q-Learning through Adversarial Behaviors [4.36117236405564]
This paper proposes a new Q-learning algorithm with a dummy adversarial player, which is called dummy adversarial Q-learning (DAQ) The proposed DAQ unifies several Q-learning variations to control overestimation biases, such as maxmin Q-learning and minmax Q-learning. A finite-time convergence of DAQ is analyzed from an integrated perspective by adapting an adversarial Q-learning.
arXiv Detail & Related papers (2023-10-10T03:46:32Z)
VA-learning as a more efficient alternative to Q-learning [49.526579981437315]
We introduce VA-learning, which directly learns advantage function and value function using bootstrapping. VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning. Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning.
arXiv Detail & Related papers (2023-05-29T15:44:47Z)
Quantum Imitation Learning [74.15588381240795]
We propose quantum imitation learning (QIL) with a hope to utilize quantum advantage to speed up IL. We develop two QIL algorithms, quantum behavioural cloning (Q-BC) and quantum generative adversarial imitation learning (Q-GAIL) Experiment results demonstrate that both Q-BC and Q-GAIL can achieve comparable performance compared to classical counterparts.
arXiv Detail & Related papers (2023-04-04T12:47:35Z)
Bayesian Q-learning With Imperfect Expert Demonstrations [56.55609745121237]
We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. We evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards.
arXiv Detail & Related papers (2022-10-01T17:38:19Z)
Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL) Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
Self-correcting Q-Learning [14.178899938667161]
We introduce a new way to address the bias in the form of a "self-correcting algorithm" Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate.
arXiv Detail & Related papers (2020-12-02T11:36:24Z)
Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
Q-Learning with Differential Entropy of Q-Tables [4.221871357181261]
We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information. We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm.
arXiv Detail & Related papers (2020-06-26T04:37:10Z)
Periodic Q-Learning [24.099046883918046]
We study the so-called periodic Q-learning algorithm (PQ-learning for short) PQ-learning maintains two separate Q-value estimates - the online estimate and target estimate. In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample for finding an epsilon-optimal policy.
arXiv Detail & Related papers (2020-02-23T00:33:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.