VA-learning as a more efficient alternative to Q-learning
- URL: http://arxiv.org/abs/2305.18161v2
- Date: Sat, 31 Aug 2024 12:28:05 GMT
- Title: VA-learning as a more efficient alternative to Q-learning
- Authors: Yunhao Tang, RĂ©mi Munos, Mark Rowland, Michal Valko,
- Abstract summary: We introduce VA-learning, which directly learns advantage function and value function using bootstrapping.
VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning.
Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning.
- Score: 49.526579981437315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function. A natural question is: Why not learn the advantage function directly? In this work, we introduce VA-learning, which directly learns advantage function and value function using bootstrapping, without explicit reference to Q-functions. VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning. Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning both in tabular implementations and deep RL agents on Atari-57 games. We also identify a close connection between VA-learning and the dueling architecture, which partially explains why a simple architectural change to DQN agents tends to improve performance.
Related papers
- Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation [37.36913210031282]
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering.
We propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
arXiv Detail & Related papers (2024-05-29T01:49:20Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z) - Learning Value Functions from Undirected State-only Experience [17.76847333440422]
We show that Markov Qlearning in discrete decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space.
This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience.
We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions.
arXiv Detail & Related papers (2022-04-26T17:24:36Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples [8.938418994111716]
Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning.
An offline expert assesses the value of a state in a coarse manner using three discrete values.
Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias.
arXiv Detail & Related papers (2021-06-28T12:41:45Z) - Smooth Q-learning: Accelerate Convergence of Q-learning Using Similarity [2.088376060651494]
The similarity between different states and actions is considered in the proposed method.
During the training, a new updating mechanism is used, in which the Q value of the similar state-action pairs are updated synchronously.
arXiv Detail & Related papers (2021-06-02T13:05:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.