RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
- URL: http://arxiv.org/abs/2205.15043v1
- Date: Mon, 30 May 2022 12:18:43 GMT
- Title: RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
- Authors: Yiqin Tan, Pihe Hu, Ling Pan, Longbo Huang
- Abstract summary: Training deep reinforcement learning (DRL) models usually requires high costs.
compressing DRL models possesses immense potential for training acceleration and model deployment.
We propose a novel sparse DRL training framework, "the textbfRigged textbfReinforcement textbfLearning textbfLottery" (RLx2)
- Score: 23.104546205134103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep reinforcement learning (DRL) models usually requires high
computation costs. Therefore, compressing DRL models possesses immense
potential for training acceleration and model deployment. However, existing
methods that generate small models mainly adopt the knowledge distillation
based approach by iteratively training a dense network, such that the training
process still demands massive computing resources. Indeed, sparse training from
scratch in DRL has not been well explored and is particularly challenging due
to non-stationarity in bootstrap training. In this work, we propose a novel
sparse DRL training framework, "the \textbf{R}igged \textbf{R}einforcement
\textbf{L}earning \textbf{L}ottery" (RLx2), which is capable of training a DRL
agent \emph{using an ultra-sparse network throughout} for off-policy
reinforcement learning. The systematic RLx2 framework contains three key
components: gradient-based topology evolution, multi-step Temporal Difference
(TD) targets, and dynamic-capacity replay buffer. RLx2 enables efficient
topology exploration and robust Q-value estimation simultaneously. We
demonstrate state-of-the-art sparse training performance in several continuous
control tasks using RLx2, showing $7.5\times$-$20\times$ model compression with
less than $3\%$ performance degradation, and up to $20\times$ and $50\times$
FLOPs reduction for training and inference, respectively.
Related papers
- Adding Conditional Control to Diffusion Models with Reinforcement Learning [68.06591097066811]
Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples.
While these diffusion models trained on large datasets have achieved success, there is often a need to introduce additional controls in downstream fine-tuning processes.
This work presents a novel method based on reinforcement learning (RL) to add such controls using an offline dataset.
arXiv Detail & Related papers (2024-06-17T22:00:26Z) - Compressing Deep Reinforcement Learning Networks with a Dynamic
Structured Pruning Method for Autonomous Driving [63.155562267383864]
Deep reinforcement learning (DRL) has shown remarkable success in complex autonomous driving scenarios.
DRL models inevitably bring high memory consumption and computation, which hinders their wide deployment in resource-limited autonomous driving devices.
We introduce a novel dynamic structured pruning approach that gradually removes a DRL model's unimportant neurons during the training stage.
arXiv Detail & Related papers (2024-02-07T09:00:30Z) - Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs)
We propose a new RL method named RLMEC that incorporates a generative model as the reward model.
Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z) - Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning [50.9692060692705]
This paper introduces $textbfLanguage Models for $textbfMo$tion Control ($textbfLaMo$), a general framework based on Decision Transformers for offline RL.
Our framework highlights four crucial components:.
Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method,.
In particular, our method demonstrates superior performance in scenarios with limited data samples.
arXiv Detail & Related papers (2023-10-31T16:24:17Z) - SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework.
We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training.
SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task via traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term compared to RL$2$ while drastically reducing meta-training time and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Bootstrapped Transformer for Offline Reinforcement Learning [31.43012728924881]
offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment.
Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem.
We propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data.
arXiv Detail & Related papers (2022-06-17T05:57:47Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - GST: Group-Sparse Training for Accelerating Deep Reinforcement Learning [0.3674863913115432]
We propose a novel weight compression method for DRL training acceleration, named group-sparse training ( GST)
GST achieves a 25 %p $sim$ 41.5 %p higher average compression ratio than the iterative pruning method without reward drop in Mujoco Halfcheetah-v2 and Mujoco humanoid-v2 environment with TD3 training.
arXiv Detail & Related papers (2021-01-24T05:52:31Z) - Learning to Prune Deep Neural Networks via Reinforcement Learning [64.85939668308966]
PuRL is a deep reinforcement learning based algorithm for pruning neural networks.
It achieves sparsity and accuracy comparable to current state-of-the-art methods.
arXiv Detail & Related papers (2020-07-09T13:06:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.