Related papers: Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

URL: http://arxiv.org/abs/2403.03950v1
Date: Wed, 6 Mar 2024 18:55:47 GMT
Title: Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Authors: Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Ta\"iga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal
Abstract summary: We show that training value functions with categorical cross-entropy improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers.
Score: 109.44370201929246
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.

Related papers

RL as Regressor: A Reinforcement Learning Approach for Function Approximation [0.0]
We propose framing regression as a Reinforcement Learning (RL) problem.<n>We demonstrate this by treating a model's prediction as an action and defining a custom reward signal based on the prediction error.<n>We show that the RL framework not only successfully solves the regression problem but also offers enhanced flexibility in defining objectives and guiding the learning process.
arXiv Detail & Related papers (2025-07-31T21:39:24Z)
Online Training and Pruning of Deep Reinforcement Learning Networks [0.0]
Scaling deep neural networks (NN) of reinforcement learning (RL) algorithms has been shown to enhance performance when feature extraction networks are used.<n>We propose an approach to integrate simultaneous training and pruning within advanced RL methods.
arXiv Detail & Related papers (2025-07-16T07:17:41Z)
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv Detail & Related papers (2025-06-20T17:54:24Z)
ReaCritic: Large Reasoning Transformer-based DRL Critic-model Scaling For Heterogeneous Networks [4.931691794637798]
Heterogeneous Networks (HetNets) pose critical challenges for intelligent management due to the diverse user requirements and time-varying wireless conditions.<n>We propose ReaCritic, a large reasoning transformer-based criticmodel scaling scheme that brings reasoning ability into Deep Reinforcement Learning.<n>It is compatible with a broad range of value-based and actor-critic DRL algorithms and enhances generalization in dynamic wireless environments.
arXiv Detail & Related papers (2025-05-16T08:42:08Z)
N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs [42.446740732573296]
In-context learning allows models like transformers to adapt to new tasks without updating their weights. Existing in-context RL methods, such as Algorithm Distillation (AD), demand large, carefully curated datasets. In this work we integrated the n-gram induction heads into transformers for in-context RL.
arXiv Detail & Related papers (2024-11-04T10:31:03Z)
Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales [13.818149654692863]
Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance. In this work, we improve the stability of RL training by adapting the reverse cross entropy (RCE) from supervised learning for noisy data to define a symmetric RL loss.
arXiv Detail & Related papers (2024-05-27T19:28:33Z)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task via traditional RL, in the inputs to meta-RL. We show that RL$3$ earns greater cumulative reward in the long term compared to RL$2$ while drastically reducing meta-training time and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem [67.32731657297377]
Traveling Salesman Problem (TSP) is a classic routing optimization problem originally arising in the domain of transportation and logistics. Recently, Deep Reinforcement Learning has been increasingly employed to solve TSP due to its high inference efficiency. We propose a novel end-to-end DRL approach, referred to as Pointerformer, based on multi-pointer Transformer.
arXiv Detail & Related papers (2023-04-19T03:48:32Z)
Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients. Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty. In this work, we define a robust MRL objective with a controlled level. The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z)
On Transforming Reinforcement Learning by Transformer: The Development Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision. We group existing developments in two categories: architecture enhancement and trajectory optimization. We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z)
Hypernetworks in Meta-Reinforcement Learning [47.25270748922176]
Multi-task reinforcement learning (RL) and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. State of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution and are applicable to meta-RL.
arXiv Detail & Related papers (2022-10-20T15:34:52Z)
Transformers are Meta-Reinforcement Learners [0.060917028769172814]
We present TrMRL, a meta-RL agent that mimics the memory reinstatement mechanism using the transformer architecture. We show that the self-attention computes a consensus representation that minimizes the Bayes Risk at each layer. Results show that TrMRL presents comparable or superior performance, sample efficiency, and out-of-distribution generalization.
arXiv Detail & Related papers (2022-06-14T06:21:13Z)
Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation [25.493902939111265]
We investigate causes of instability when using data augmentation in off-policy Reinforcement Learning algorithms. We propose a simple yet effective technique for stabilizing this class of algorithms under augmentation. Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL.
arXiv Detail & Related papers (2021-07-01T17:58:05Z)
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning [97.28695683236981]
More gradient updates decrease the expressivity of the current value network. We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
arXiv Detail & Related papers (2020-10-27T17:55:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.