Training Neural Networks from Scratch with Parallel Low-Rank Adapters
- URL: http://arxiv.org/abs/2402.16828v1
- Date: Mon, 26 Feb 2024 18:55:13 GMT
- Title: Training Neural Networks from Scratch with Parallel Low-Rank Adapters
- Authors: Minyoung Huh, Brian Cheung, Jeremy Bernstein, Phillip Isola, Pulkit
Agrawal
- Abstract summary: We introduce LoRA-the-Explorer (LTE), a novel bi-level optimization algorithm designed to enable parallel training of multiple low-rank heads across computing nodes.
Our approach includes extensive experimentation on vision transformers using various vision datasets, demonstrating that LTE is competitive with standard pre-training.
- Score: 50.171622511923474
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The scalability of deep learning models is fundamentally limited by computing
resources, memory, and communication. Although methods like low-rank adaptation
(LoRA) have reduced the cost of model finetuning, its application in model
pre-training remains largely unexplored. This paper explores extending LoRA to
model pre-training, identifying the inherent constraints and limitations of
standard LoRA in this context. We introduce LoRA-the-Explorer (LTE), a novel
bi-level optimization algorithm designed to enable parallel training of
multiple low-rank heads across computing nodes, thereby reducing the need for
frequent synchronization. Our approach includes extensive experimentation on
vision transformers using various vision datasets, demonstrating that LTE is
competitive with standard pre-training.
Related papers
- Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition [11.399520888150468]
Deep Neural Networks (DNNs) have achieved remarkable success in addressing many previously unsolvable tasks.
The storage and computational requirements associated with DNNs pose a challenge for deploying these trained models on resource-limited devices.
We present a theoretically-justified novel approach, termed Low-Rank Induced Training (LoRITa)
LoRITa promotes low-rankness through the composition of linear layers and compresses by using singular value truncation.
arXiv Detail & Related papers (2024-05-06T00:58:23Z) - Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z) - OnDev-LCT: On-Device Lightweight Convolutional Transformers towards
federated learning [29.798780069556074]
Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices.
We propose OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks with limited training data and resources.
arXiv Detail & Related papers (2024-01-22T02:17:36Z) - Run LoRA Run: Faster and Lighter LoRA Implementations [50.347242693025336]
LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers.
This paper presents the RunLoRA framework for efficient implementations of LoRA.
Experiments show up to 28% speedup on language modeling networks.
arXiv Detail & Related papers (2023-12-06T10:54:34Z) - A Neuromorphic Architecture for Reinforcement Learning from Real-Valued
Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments.
This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - RLFlow: Optimising Neural Network Subgraph Transformation with World
Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime.
We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data.
We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z) - Real-time Federated Evolutionary Neural Architecture Search [14.099753950531456]
Federated learning is a distributed machine learning approach to privacy preservation.
We propose an evolutionary approach to real-time federated neural architecture search that not only optimize the model performance but also reduces the local payload.
This way, we effectively reduce computational and communication costs required for evolutionary optimization and avoid big performance fluctuations of the local models.
arXiv Detail & Related papers (2020-03-04T17:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.