Low-Precision Reinforcement Learning
- URL: http://arxiv.org/abs/2102.13565v1
- Date: Fri, 26 Feb 2021 16:16:28 GMT
- Title: Low-Precision Reinforcement Learning
- Authors: Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian
Q. Weinberger
- Abstract summary: Low-precision training has become a popular approach to reduce computation time, memory footprint, and energy consumption in supervised learning.
In this paper we consider continuous control with the state-of-the-art SAC agent and demonstrate that a na"ive adaptation of low-precision methods from supervised learning fails.
- Score: 63.930246183244705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-precision training has become a popular approach to reduce computation
time, memory footprint, and energy consumption in supervised learning. In
contrast, this promising approach has not enjoyed similarly widespread adoption
within the reinforcement learning (RL) community, in part because RL agents can
be notoriously hard to train -- even in full precision. In this paper we
consider continuous control with the state-of-the-art SAC agent and demonstrate
that a na\"ive adaptation of low-precision methods from supervised learning
fails. We propose a set of six modifications, all straightforward to implement,
that leaves the underlying agent unchanged but improves its numerical stability
dramatically. The resulting modified SAC agent has lower memory and compute
requirements while matching full-precision rewards, thus demonstrating the
feasibility of low-precision RL.
Related papers
- Instance-dependent Early Stopping [57.912273923450726]
We propose an Instance-dependent Early Stopping (IES) method that adapts the early stopping mechanism from the entire training set to the instance level.
IES considers an instance as mastered if the second-order differences of its loss value remain within a small range around zero.
IES can reduce backpropagation instances by 10%-50% while maintaining or even slightly improving the test accuracy and transfer learning performance of a model.
arXiv Detail & Related papers (2025-02-11T13:34:09Z) - Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation [13.084333776247743]
Fine-tuning can reduce robustness to distribution shifts, impacting out-of-distribution (OOD) performance.
We propose a parameter-efficient fine-tuning (PEFT) method, using an indicator function to selectively activate Low-Rank Adaptation (LoRA) blocks.
We demonstrate that effective fine-tuning can be achieved with as few as 5% of active blocks, substantially improving efficiency.
arXiv Detail & Related papers (2025-01-26T03:22:22Z) - The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws [51.608402959163925]
We present the first systematic exploration of optimal sparse pre-training configurations for large language models.
We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss.
We propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training.
arXiv Detail & Related papers (2025-01-21T20:23:22Z) - Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization [1.6749379740049926]
We introduce F-CMA, a Fast-Controlled Mini-batch Algorithm with a random reshuffling method featuring a sufficient decrease condition and a line-search procedure to ensure loss reduction per epoch.
Tests show significant improvements, including a decrease in the overall training time by 68%, an increase in per-epoch efficiency by up to 20%, and in model accuracy by up to 5%.
arXiv Detail & Related papers (2024-11-24T11:46:47Z) - Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach [0.9549646359252346]
In deep Reinforcement Learning (RL) models trained using gradient-based techniques, the choice of gradient and its learning rate are crucial to achieving good performance.
We propose dynamic Learning Rate for deep Reinforcement Learning (LRRL), a meta-learning approach that selects the learning rate based on the agent's performance during training.
arXiv Detail & Related papers (2024-10-16T14:15:28Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.
Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters.
Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z) - Unbiased and Efficient Self-Supervised Incremental Contrastive Learning [31.763904668737304]
We propose a self-supervised Incremental Contrastive Learning (ICL) framework consisting of a novel Incremental InfoNCE (NCE-II) loss function.
ICL achieves up to 16.7x training speedup and 16.8x faster convergence with competitive results.
arXiv Detail & Related papers (2023-01-28T06:11:31Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.