Related papers: Boosting Soft Q-Learning by Bounding

Boosting Soft Q-Learning by Bounding

URL: http://arxiv.org/abs/2406.18033v1
Date: Wed, 26 Jun 2024 03:02:22 GMT
Title: Boosting Soft Q-Learning by Bounding
Authors: Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni,
Abstract summary: We show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance.
Score: 4.8748194765816955
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance which we validate experimentally. Notably, we find that the proposed framework suggests an alternative method for updating the Q-function, leading to boosted performance.

Related papers

Reward Adaptation Via Q-Manipulation [3.8065968624597324]
We propose a new solution to reward adaptation (RA), the problem where the learning agent adapts to a target reward function based on one or multiple existing behaviors. Our work represents a new approach to RA via the manipulation of Q-functions. We refer to such a method as Q-Manipulation (Q-M)
arXiv Detail & Related papers (2025-03-17T17:42:54Z)
Improving Forward Compatibility in Class Incremental Learning by Increasing Representation Rank and Feature Richness [3.0620294646308754]
We introduce an effective-Rank based Feature Richness enhancement (RFR) method, designed for improving forward compatibility. Our results demonstrate the effectiveness of our approach in enhancing novel-task performance while mitigating catastrophic forgetting.
arXiv Detail & Related papers (2024-03-22T11:14:30Z)
Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning [19.4531905603925]
i-QN is a principled approach that enables multiple consecutive Bellman updates by learning a tailored sequence of action-value functions. We show that i-QN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods.
arXiv Detail & Related papers (2024-03-04T15:07:33Z)
Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks. We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information. Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z)
Learning to Rank for Active Learning via Multi-Task Bilevel Optimization [29.207101107965563]
We propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition. A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time.
arXiv Detail & Related papers (2023-10-25T22:50:09Z)
Leveraging Prior Knowledge in Reinforcement Learning via Double-Sided Bounds on the Value Function [4.48890356952206]
We show how an arbitrary approximation for the value function can be used to derive double-sided bounds on the optimal value function of interest. We extend the framework with error analysis for continuous state and action spaces.
arXiv Detail & Related papers (2023-02-19T21:47:24Z)
Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning [16.999444076456268]
We investigate the use of prior computation to estimate the value function to improve sample efficiency in on-policy policy gradient methods. In particular, we learn a new value function for the target task while combining it with a value estimate from the prior. The resulting value function is used as a baseline in the policy gradient method.
arXiv Detail & Related papers (2023-02-02T20:23:22Z)
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications. We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA) Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL) Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.