Related papers: Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization

Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization

URL: http://arxiv.org/abs/2510.09160v2
Date: Mon, 27 Oct 2025 08:24:49 GMT
Title: Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization
Authors: Le-Trung Nguyen, Enzo Tartaglione, Van-Tam Nguyen,
Abstract summary: Weight-Activation Subspace Iteration (WASI) is a method that mitigates the memory bottleneck of backpropagation.<n>On a Raspberry Pi 5, WASI achieves roughly $1.5times$ faster training and inference than vanilla training.
Score: 18.541460686751744
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As AI increasingly shapes daily life, energy consumption and data privacy have become pressing concerns. On-device learning trains models directly on edge devices, cutting energy consumption and safeguarding data privacy. However, the expanding scale of modern neural networks creates a major obstacle for on-device training. Although prior work has concentrated on compact convolutional architectures, we instead apply subspace-based training to transformer models. Motivated by the idea that a model's essential information lies in a fixed subspace, we introduce Weight-Activation Subspace Iteration (WASI), a method that mitigates the memory bottleneck of backpropagation and boosts inference efficiency in transformer models by restricting training to this subspace. Our results demonstrate that WASI maintains accuracy comparable to vanilla training while reducing memory usage by up to $62\times$ and computational cost (FLOPs) by up to $2\times$. On a Raspberry Pi 5, WASI achieves roughly $1.5\times$ faster training and inference than vanilla training.

Related papers

Exploring the Benefit of Activation Sparsity in Pre-training [117.25661020250658]
We study how activation properties change during pre-training. We propose Switchable Sparse-Dense Learning (SSD) SSD achieves comparable performance with identical model size and reduces pre-training costs.
arXiv Detail & Related papers (2024-10-04T13:53:33Z)
Block Selective Reprogramming for On-device Training of Vision Transformers [12.118303034660531]
We present block selective reprogramming (BSR) in which we fine-tune only a fraction of total blocks of a pre-trained model. Compared to the existing alternatives, our approach simultaneously reduces training memory by up to 1.4x and compute cost by up to 2x.
arXiv Detail & Related papers (2024-03-25T08:41:01Z)
Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening [51.34904967046097]
Selective Synaptic Dampening (SSD) is a fast, performant, and does not require long-term storage of the training data. We present a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data.
arXiv Detail & Related papers (2023-08-15T11:30:45Z)
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge [27.533985670823945]
TinyTrain is an on-device training approach that drastically reduces training time by selectively updating parts of the model. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0% in accuracy. It achieves 9.5x faster and 3.5x more energy-efficient training over status-quo approaches.
arXiv Detail & Related papers (2023-07-19T13:49:12Z)
CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning [8.339901980070616]
Training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs) We propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data. We present a highly efficient on-device training engine named textitCAMEL, which leverages eDRAM as the primary on-chip memory.
arXiv Detail & Related papers (2023-05-04T20:57:01Z)
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging [35.397804171588476]
Fine-tuning models on edge devices would enable privacy-preserving personalization over sensitive data. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices.
arXiv Detail & Related papers (2022-07-15T18:36:29Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z)
Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x. We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z)
Enabling Binary Neural Network Training on the Edge [7.32770338248516]
Existing binary neural network training methods require concurrent storage of high-precision activations for all layers. We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint reductions. We also demonstrate from-scratch ImageNet training of binarized ResNet-18, achieving a 3.78$times$ memory reduction.
arXiv Detail & Related papers (2021-02-08T15:06:41Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network [9.753369031264532]
Training machine learning models on mobile devices has the potential of improving both privacy and accuracy of the models. One of the major obstacles to achieving this goal is the memory limitation of mobile devices. We propose approximating the gradient matrices of deep neural networks using a low-rank parameterization as an avenue to save training memory.
arXiv Detail & Related papers (2020-01-24T05:12:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.