Skip2-LoRA: A Lightweight On-device DNN Fine-tuning Method for Low-cost Edge Devices
- URL: http://arxiv.org/abs/2410.21073v1
- Date: Mon, 28 Oct 2024 14:35:12 GMT
- Title: Skip2-LoRA: A Lightweight On-device DNN Fine-tuning Method for Low-cost Edge Devices
- Authors: Hiroki Matsutani, Masaaki Kondo, Kazuki Sunaga, Radu Marculescu,
- Abstract summary: This paper proposes Skip2-LoRA as a lightweight fine-tuning method for deep neural networks.
In our approach, trainable LoRA adapters are inserted between the last layer and every other layer to enhance the network expressive power.
Our results show that Skip2-LoRA reduces the fine-tuning time by 90.0% on average compared to the counterpart that has the same number of trainable parameters.
- Score: 7.219286228148705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes Skip2-LoRA as a lightweight fine-tuning method for deep neural networks to address the gap between pre-trained and deployed models. In our approach, trainable LoRA (low-rank adaptation) adapters are inserted between the last layer and every other layer to enhance the network expressive power while keeping the backward computation cost low. This architecture is well-suited to cache intermediate computation results of the forward pass and then can skip the forward computation of seen samples as training epochs progress. We implemented the combination of the proposed architecture and cache, denoted as Skip2-LoRA, and tested it on a $15 single board computer. Our results show that Skip2-LoRA reduces the fine-tuning time by 90.0% on average compared to the counterpart that has the same number of trainable parameters while preserving the accuracy, while taking only a few seconds on the microcontroller board.
Related papers
- PaCA: Partial Connection Adaptation for Efficient Fine-Tuning [11.379377511067732]
We propose PaCA, which fine-tunes randomly selected partial connections within the pretrained weights instead of introducing adapter layers in the model.
Compared to LoRA, PaCA reduces training time by 22% and total memory usage by 16%, while maintaining comparable accuracy across various fine-tuning scenarios.
arXiv Detail & Related papers (2025-02-28T13:30:10Z) - Run LoRA Run: Faster and Lighter LoRA Implementations [50.347242693025336]
LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers.
This paper presents the RunLoRA framework for efficient implementations of LoRA.
Experiments show up to 28% speedup on language modeling networks.
arXiv Detail & Related papers (2023-12-06T10:54:34Z) - Learning to Compose SuperWeights for Neural Parameter Allocation Search [61.078949532440724]
We show that our approach can generate parameters for many network using the same set of weights.
This enables us to support tasks like efficient ensembling and anytime prediction.
arXiv Detail & Related papers (2023-12-03T04:20:02Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - Spike time displacement based error backpropagation in convolutional
spiking neural networks [0.6193838300896449]
In this paper, we extend the STiDi-BP algorithm to employ it in deeper and convolutional architectures.
The evaluation results on the image classification task based on two popular benchmarks, MNIST and Fashion-MNIST, confirm that this algorithm has been applicable in deep SNNs.
We consider a convolutional SNN with two sets of weights: real-valued weights that are updated in the backward pass and their signs, binary weights, that are employed in the feedforward process.
arXiv Detail & Related papers (2021-08-31T05:18:59Z) - LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and
Inter-Layer Gradient Pipelining and Multiprocessor Scheduling [6.549125450209931]
Training model parameters by backpropagation inherently create feedback loops.
The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training.
arXiv Detail & Related papers (2021-08-14T23:51:00Z) - BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio
Sparsification [3.3711251611130337]
A hardware-friendly pruning algorithm for reducing energy consumption and improving the speed of Long Short-Term Memory (LSTM) neural network accelerators is presented.
Results show that the proposed accelerator could provide up to 272% higher effective GOPS/W and the perplexity error is reduced by up to 1.4% for the PTB dataset.
arXiv Detail & Related papers (2021-01-07T18:23:48Z) - Multi-scale Interaction for Real-time LiDAR Data Segmentation on an
Embedded Platform [62.91011959772665]
Real-time semantic segmentation of LiDAR data is crucial for autonomously driving vehicles.
Current approaches that operate directly on the point cloud use complex spatial aggregation operations.
We propose a projection-based method, called Multi-scale Interaction Network (MINet), which is very efficient and accurate.
arXiv Detail & Related papers (2020-08-20T19:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.