Stable and low-precision training for large-scale vision-language models
- URL: http://arxiv.org/abs/2304.13013v2
- Date: Tue, 17 Oct 2023 00:11:15 GMT
- Title: Stable and low-precision training for large-scale vision-language models
- Authors: Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali
Farhadi, Ludwig Schmidt
- Abstract summary: We introduce new methods for accelerating and stabilizing training for large language-vision models.
For acceleration, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25%.
For stability, we analyze loss spikes and find they consistently occur 1-8 after the squared gradients become under-estimated.
- Score: 108.62077651227607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce new methods for 1) accelerating and 2) stabilizing training for
large language-vision models. 1) For acceleration, we introduce SwitchBack, a
linear layer for int8 quantized training which provides a speed-up of 13-25%
while matching the performance of bfloat16 training within 0.1 percentage
points for the 1B parameter CLIP ViT-Huge -- the largest int8 training to date.
Our main focus is int8 as GPU support for float8 is rare, though we also
analyze float8 training through simulation. While SwitchBack proves effective
for float8, we show that standard techniques are also successful if the network
is trained and initialized so that large feature magnitudes are discouraged,
which we accomplish via layer-scale initialized with zeros. 2) For stability,
we analyze loss spikes and find they consistently occur 1-8 iterations after
the squared gradients become under-estimated by their AdamW second moment
estimator. As a result, we recommend an AdamW-Adafactor hybrid which avoids
loss spikes when training a CLIP ViT-Huge model and outperforms gradient
clipping at the scales we test.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.