Related papers: TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

URL: http://arxiv.org/abs/2007.11622v5
Date: Sun, 6 Jun 2021 01:23:16 GMT
Title: TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning
Authors: Han Cai, Chuang Gan, Ligeng Zhu, Song Han
Abstract summary: On-device learning enables edge devices to continually adapt the AI models to new data. Existing work solves this problem by reducing the number of trainable parameters. We present Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning.
Score: 78.80707950262214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: On-device learning enables edge devices to continually adapt the AI models to new data, which requires a small memory footprint to fit the tight memory constraint of edge devices. Existing work solves this problem by reducing the number of trainable parameters. However, this doesn't directly translate to memory saving since the major bottleneck is the activations, not parameters. In this work, we present Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning. TinyTL freezes the weights while only learns the bias modules, thus no need to store the intermediate activations. To maintain the adaptation capacity, we introduce a new memory-efficient bias module, the lite residual module, to refine the feature extractor by learning small residual feature maps adding only 3.8% memory overhead. Extensive experiments show that TinyTL significantly saves the memory (up to 6.5x) with little accuracy loss compared to fine-tuning the full network. Compared to fine-tuning the last layer, TinyTL provides significant accuracy improvements (up to 34.1%) with little memory overhead. Furthermore, combined with feature extractor adaptation, TinyTL provides 7.3-12.9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.

Related papers

Leveraging Lightweight Generators for Memory Efficient Continual Learning [0.01874930567916036]
Catastrophic forgetting can be trivially alleviated by keeping all data from previous tasks in memory.<n>This paper aims to decrease required memory for memory-based continuous learning algorithms.
arXiv Detail & Related papers (2025-06-24T14:59:52Z)
S2A: A Unified Framework for Parameter and Memory Efficient Transfer Learning [8.602744958104969]
We propose a new PETL framework, called Structure to Activation (S2A), to reduce the memory footprint of activation during fine-tuning.<n>Specifically, our framework consists of: 1) Activation modules design(i.e., bias, prompt and side modules) in the parametric model structure, which results in a significant reduction of adjustable parameters and activation memory.<n>We show that our methods not only outperform existing PETL techniques, achieving a fourfold reduction in GPU memory footprint on average, but also shows competitive performance in accuracy with fewer tunable parameters.
arXiv Detail & Related papers (2025-03-11T08:10:03Z)
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation [29.139579820699495]
This work strives to reduce memory overhead in fine-tuning from perspectives of activation function and layer normalization. We apply our Approx-BP theory to backpropagation training and derive memory-efficient alternatives of GELU and SiLU activation functions. In addition, we introduce a Memory-Sharing Backpropagation strategy, which enables the activation memory to be shared by two adjacent layers.
arXiv Detail & Related papers (2024-06-24T03:09:15Z)
DTL: Disentangled Transfer Learning for Visual Recognition [21.549234013998255]
We introduce Disentangled Transfer Learning (DTL), which disentangles the trainable parameters from the backbone using a lightweight Compact Side Network (CSN) The proposed method not only reduces a large amount of GPU memory usage and trainable parameters, but also outperforms existing PETL methods by a significant margin in accuracy.
arXiv Detail & Related papers (2023-12-13T02:51:26Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models. AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
MobileTL: On-device Transfer Learning with Inverted Residual Blocks [14.305834934988185]
We present MobileTL, a transfer learning method for models built with Inverted Residual Blocks (IRBs) MobileTL trains the shifts for internal normalization layers to avoid storing activation maps for the backward pass. Our method reduces memory usage by 46% and 53% for MobileNetV2 and V3 IRBs, respectively.
arXiv Detail & Related papers (2022-12-05T23:07:55Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning [82.93130407930762]
It is costly to update the entire parameter set of large pre-trained models. PETL techniques allow updating a small subset of parameters inside a pre-trained backbone network for a new task. We propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts.
arXiv Detail & Related papers (2022-06-13T23:51:56Z)
A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays [66.62377866022221]
Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle. We introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory.
arXiv Detail & Related papers (2021-10-20T11:01:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.