Related papers: Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification

URL: http://arxiv.org/abs/2412.01195v1
Date: Mon, 02 Dec 2024 06:57:46 GMT
Title: Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
Authors: Bei Liu, Yanmin Qian,
Abstract summary: We explore a memory-efficient training strategy for deep speaker embedding learning in resource-constrained scenarios.<n>For activations, we design two types of reversible neural networks which eliminate the need to store intermediate activations.<n>For states, we introduce a dynamic quantization approach that replaces the original 32-bit floating-point values with a dynamic tree-based 8-bit data type.
Score: 50.596077598766975
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent speaker verification (SV) systems have shown a trend toward adopting deeper speaker embedding extractors. Although deeper and larger neural networks can significantly improve performance, their substantial memory requirements hinder training on consumer GPUs. In this paper, we explore a memory-efficient training strategy for deep speaker embedding learning in resource-constrained scenarios. Firstly, we conduct a systematic analysis of GPU memory allocation during SV system training. Empirical observations show that activations and optimizer states are the main sources of memory consumption. For activations, we design two types of reversible neural networks which eliminate the need to store intermediate activations during back-propagation, thereby significantly reducing memory usage without performance loss. For optimizer states, we introduce a dynamic quantization approach that replaces the original 32-bit floating-point values with a dynamic tree-based 8-bit data type. Experimental results on VoxCeleb demonstrate that the reversible variants of ResNets and DF-ResNets can perform training without the need to cache activations in GPU memory. In addition, the 8-bit versions of SGD and Adam save 75% of memory costs while maintaining performance compared to their 32-bit counterparts. Finally, a detailed comparison of memory usage and performance indicates that our proposed models achieve up to 16.2x memory savings, with nearly identical parameters and performance compared to the vanilla systems. In contrast to the previous need for multiple high-end GPUs such as the A100, we can effectively train deep speaker embedding extractors with just one or two consumer-level 2080Ti GPUs.

Related papers

Memory Layers at Scale [67.00854080570979]
This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale. On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the budget, as well as mixture-of-expert models when matched for both compute and parameters. We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters.
arXiv Detail & Related papers (2024-12-12T23:56:57Z)
APOLLO: SGD-like Memory, AdamW-level Performance [61.53444035835778]
Large language models (LLMs) are notoriously memory-intensive during training. Various memory-efficient Scals have been proposed to reduce memory usage. They face critical challenges: (i) costly SVD operations; (ii) significant performance trade-offs compared to AdamW; and (iii) still substantial memory overhead to maintain competitive performance.
arXiv Detail & Related papers (2024-12-06T18:55:34Z)
Less Memory Means smaller GPUs: Backpropagation with Compressed Activations [1.7065506903618906]
The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers with thousands of accelerators. With this approach we are able to reduce the peak memory consumption by 29% at the cost of a longer training schedule.
arXiv Detail & Related papers (2024-09-18T11:57:05Z)
Dynamic neural network with memristive CIM and CAM for 2D and 3D vision [57.6208980140268]
We propose a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets.
arXiv Detail & Related papers (2024-07-12T04:55:57Z)
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection [133.45193150403537]
Training Large Language Models (LLMs) presents significant memory challenges due to the growing size of weights and GPU states. In this work, we propose Gradient Low-Rank Projection (GaLore) as a memory-efficient training strategy. Our 8-bit GaLore further reduces memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline.
arXiv Detail & Related papers (2024-03-06T07:29:57Z)
FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems [5.572152653851948]
FULL-W2V exploits the opportunities for data reuse in the W2V algorithm to reduce access to low memory levels and improve temporal locality. Our prototype implementation achieves 2.97X speedup when ported from Nvidia Pascal P100 to Volta V100 cards, and outperforms the state-of-the-art by 5.72X on V100 cards with the same embedding quality.
arXiv Detail & Related papers (2023-12-12T21:22:07Z)
Compressing the Backward Pass of Large-Scale Neural Architectures by Structured Activation Pruning [0.0]
Sparsity in Deep Neural Networks (DNNs) has gained attention as a solution. This work focuses on ephemeral sparsity, aiming to reduce memory consumption during training. We report the effectiveness of activation pruning by evaluating training speed, accuracy, and memory usage of large-scale neural architectures.
arXiv Detail & Related papers (2023-11-28T15:31:31Z)
GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models [32.23853007467266]
GEAR is designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers) It is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand. Gear can achieve performance levels up to 6x greater than Reverb when training state-of-the-art large RL models.
arXiv Detail & Related papers (2023-10-08T15:39:43Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
Robust High-dimensional Memory-augmented Neural Networks [13.82206983716435]
Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues. Access to this explicit memory occurs via soft read and write operations involving every individual memory entry. We propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors.
arXiv Detail & Related papers (2020-10-05T12:01:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.