Related papers: MinMaxMin $Q$-learning

MinMaxMin $Q$-learning

URL: http://arxiv.org/abs/2402.05951v3
Date: Sun, 2 Jun 2024 19:40:34 GMT
Title: MinMaxMin $Q$-learning
Authors: Nitsan Soffair, Shie Mannor,
Abstract summary: MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias. We implement MinMaxMin on top of TD3 and TD7, subjecting it to rigorous testing against state-of-the-art continuous-space algorithms. The results show a consistent performance improvement of MinMaxMin over DDPG, TD3, and TD7 across all tested tasks.
Score: 48.61228614796803
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms. Its core formula relies on the disagreement among $Q$-networks in the form of the min-batch MaxMin $Q$-networks distance which is added to the $Q$-target and used as the priority experience replay sampling-rule. We implement MinMaxMin on top of TD3 and TD7, subjecting it to rigorous testing against state-of-the-art continuous-space algorithms-DDPG, TD3, and TD7-across popular MuJoCo and Bullet environments. The results show a consistent performance improvement of MinMaxMin over DDPG, TD3, and TD7 across all tested tasks.

Related papers

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
SQT -- std $Q$-target [47.3621151424817]
Std $Q$-target is a conservative, actor-critic, ensemble, $Q$-learning-based algorithm. We implement SQT on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms. Our results demonstrate SQT's $Q$-target formula superiority over TD3's $Q$-target formula as a conservative solution to overestimation bias in RL.
arXiv Detail & Related papers (2024-02-03T21:36:22Z)
PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities [25.153506493249854]
We show an adaptive multi-agent learning technique for min-max optimization problems. We also propose an algorithm called PRECISION that enjoys a reduction in the number of iterations.
arXiv Detail & Related papers (2023-03-05T00:26:10Z)
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
We study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $tau$. We show the minimax CVaR regret rate is $Omega(sqrttau-1AK)$, where $A$ is the number of actions and $K$ is the number of episodes. We show that our algorithm achieves the optimal regret of $widetilde O(tau-1sqrtSAK)$ under a continuity assumption and in general attains a near
arXiv Detail & Related papers (2023-02-07T02:22:31Z)
Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations [79.66404989555566]
We consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $Pi$ that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP.
arXiv Detail & Related papers (2021-06-22T03:20:40Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model. We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs. To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z)
The Complexity of Constrained Min-Max Optimization [29.57458485068705]
We show that an approximate local point large enough min-max is guaranteed to exist. More importantly, we show an approximate fixed gradient Descent/Ascent approximation complete. Our result is the first to show an exponential approximation of two fundamental optimization problems.
arXiv Detail & Related papers (2020-09-21T05:54:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.