Related papers: Simplifying Deep Temporal Difference Learning

Simplifying Deep Temporal Difference Learning

URL: http://arxiv.org/abs/2407.04811v2
Date: Wed, 23 Oct 2024 12:27:12 GMT
Title: Simplifying Deep Temporal Difference Learning
Authors: Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, Mario Martin,
Abstract summary: We investigate whether it is possible to accelerate and simplify TD training while maintaining its stability. Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms. Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm.
Score: 3.458933902627673
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks to stabilise training, primarily a replay buffer and target networks. Unfortunately, the delayed updating of frozen network parameters in the target network harms the sample efficiency and, similarly, the replay buffer introduces memory and implementation overheads. In this paper, we investigate whether it is possible to accelerate and simplify TD training while maintaining its stability. Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms without the need for a target network, even with off-policy data. Empirically, we find that online, parallelised sampling enabled by vectorised environments stabilises training without the need of a replay buffer. Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm. Surprisingly, this simple algorithm is competitive with more complex methods like: Rainbow in Atari, R2D2 in Hanabi, QMix in Smax, PPO-RNN in Craftax, and can be up to 50x faster than traditional DQN without sacrificing sample efficiency. In an era where PPO has become the go-to RL algorithm, PQN reestablishes Q-learning as a viable alternative.

Related papers

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training [71.16258800411696]
Reinforcement learning (RL) is a critical component of large language model (LLM) post-training. Existing on-policy algorithms used for post-training are inherently incompatible with the use of experience replay buffers. We propose efficiently obtaining this benefit of replay buffers via Trajectory Balance with Asynchrony (TBA)
arXiv Detail & Related papers (2025-03-24T17:51:39Z)
BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading [2.502222151305252]
Recent advancements in quantum computing suggest that quantum models might require less data for training compared to classical methods. We propose a batch RL algorithm that utilizes VQC as function approximators within the discrete batch-constraint deep Q-learning algorithm. We evaluate the efficiency of our algorithm on the OpenAI CartPole environment and compare its performance to the classical neural network-based discrete BCQ.
arXiv Detail & Related papers (2023-04-27T16:43:01Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs) Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization. We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z)
M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network [6.689964384669018]
We propose a framework which uses the Max-Mean loss in Deep Q-Network (M$2$DQN) Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such as the maximum TD-error of these batches is minimized. We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN) in several gym games.
arXiv Detail & Related papers (2022-09-16T09:20:35Z)
Trainability Preserving Neural Structured Pruning [64.65659982877891]
We present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification. TPP can compete with the ground-truth dynamical isometry recovery method on linear networks. It delivers encouraging performance in comparison to many top-performing filter pruning methods.
arXiv Detail & Related papers (2022-07-25T21:15:47Z)
Deep Q-network using reservoir computing with multi-layered readout [0.0]
Recurrent neural network (RNN) based reinforcement learning (RL) is used for learning context-dependent tasks. An approach with replay memory introducing reservoir computing has been proposed, which trains an agent without BPTT. This paper shows that the performance of this method improves by using a multi-layered neural network for the readout layer.
arXiv Detail & Related papers (2022-03-03T00:32:55Z)
DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning [2.918938321104601]
We propose DNS: a Determinantal Point Process based Neural Network Sampler. DNS uses k-dpp to sample a subset of neural networks for backpropagation at every training step. Our experiments show that DNS augmented REDQ outperforms baseline REDQ in terms of average cumulative reward.
arXiv Detail & Related papers (2022-01-31T17:08:39Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER) SEER is a simple modification of existing off-policy deep reinforcement learning methods. We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
A Practical Layer-Parallel Training Algorithm for Residual Networks [41.267919563145604]
gradient-based algorithms for training ResNets typically require a forward pass of the input data, followed by back-propagating the objective gradient to update parameters. We propose a novel serial-parallel hybrid training strategy to enable the use of data augmentation, together with downsampling filters to reduce the communication cost.
arXiv Detail & Related papers (2020-09-03T06:03:30Z)
Deep Networks with Fast Retraining [0.0]
This paper proposes a novel MP inverse-based fast retraining strategy for deep convolutional neural network (DCNN) learning. In each training, a random learning strategy that controls the number of convolutional layers trained in the backward pass is first utilized. Then, an MP inverse-based batch-by-batch learning strategy, which enables the network to be implemented without access to industrial-scale computational resources, is developed.
arXiv Detail & Related papers (2020-08-13T15:17:38Z)
An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning [2.99321624683618]
We propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform.
arXiv Detail & Related papers (2020-05-10T12:37:26Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.