Related papers: $DA^3$: Deep Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning

$DA^3$: Deep Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning

URL: http://arxiv.org/abs/2012.01362v2
Date: Mon, 29 Mar 2021 20:13:15 GMT
Title: $DA^3$: Deep Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning
Authors: Li Yang, Adnan Siraj Rakin and Deliang Fan
Abstract summary: Large memory used for activation storage is the bottleneck that largely limits the training time and cost on edge devices. We propose Deep Additive Attention Adaption, a novel memory-efficient on-device multi-domain learning method. We validate $DA3$ on multiple datasets against state-of-the-art methods, which shows good improvement in both accuracy and training time.
Score: 30.53018068935323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Nowadays, one practical limitation of deep neural network (DNN) is its high degree of specialization to a single task or domain (e.g., one visual domain). It motivates researchers to develop algorithms that can adapt DNN model to multiple domains sequentially, meanwhile still performing well on the past domains, which is known as multi-domain learning. Almost all conventional methods only focus on improving accuracy with minimal parameter update, while ignoring high computing and memory cost during training, which makes it impossible to deploy multi-domain learning into more and more widely used resource-limited edge devices, like mobile phone, IoT, embedded system, etc. During our study in multi-domain training, we observe that large memory used for activation storage is the bottleneck that largely limits the training time and cost on edge devices. To reduce training memory usage, while keeping the domain adaption accuracy performance, in this work, we propose Deep Additive Attention Adaption, a novel memory-efficient on-device multi-domain learning method, aiming to achieve domain adaption on memory-limited edge devices. To reduce the training memory consumption during on-device training, $DA^3$ freezes the weights of the pre-trained backbone model (i.e., no need to store activation features during backward propagation). Furthermore, to improve the adaption accuracy performance, we propose to improve the model capacity by learning a novel additive attention adaptor module, which is also designed to avoid activation memory buffering for improving memory efficiency. We validate $DA^3$ on multiple datasets against state-of-the-art methods, which shows good improvement in both accuracy and training time.

Related papers

DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training [41.09085549544767]
We introduce a Dynamic Activation Framework (DAF) that enables scalable and efficient on-device training through system-level optimizations.<n>DAF achieves both memory- and time-efficient dynamic quantization training by addressing key system bottlenecks.<n> Evaluations on various deep learning models across embedded and mobile platforms demonstrate up to a $22.9times$ reduction in memory usage and a $3.2times$ speedup.
arXiv Detail & Related papers (2025-07-09T08:59:30Z)
EMN: Brain-inspired Elastic Memory Network for Quick Domain Adaptive Feature Mapping [57.197694698750404]
We propose a novel gradient-free Elastic Memory Network to support quick fine-tuning of the mapping between features and prediction. EMN adopts randomly connected neurons to memorize the association of features and labels, where the signals in the network are propagated as impulses. EMN can achieve up to 10% enhancement of performance while only needing less than 1% timing cost of traditional domain adaptation methods.
arXiv Detail & Related papers (2024-02-04T09:58:17Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning [8.339901980070616]
Training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs) We propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data. We present a highly efficient on-device training engine named textitCAMEL, which leverages eDRAM as the primary on-chip memory.
arXiv Detail & Related papers (2023-05-04T20:57:01Z)
Efficient On-device Training via Gradient Filtering [14.484604762427717]
We propose a new gradient filtering approach which enables on-device CNN model training. Our approach creates a special structure with fewer unique elements in the gradient map. Our approach opens up a new direction of research with a huge potential for on-device training.
arXiv Detail & Related papers (2023-01-01T02:33:03Z)
Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training [23.536294640280087]
We propose the nested forward automatic differentiation (Forward-AD) for the element-wise activation function for memory-efficient training. Our evaluation shows that nested Forward-AD reduces the memory footprint up to 1.97x than the baseline model.
arXiv Detail & Related papers (2022-09-22T04:48:48Z)
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging [35.397804171588476]
Fine-tuning models on edge devices would enable privacy-preserving personalization over sensitive data. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices.
arXiv Detail & Related papers (2022-07-15T18:36:29Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
Forget Less, Count Better: A Domain-Incremental Self-Distillation Learning Benchmark for Lifelong Crowd Counting [51.44987756859706]
Off-the-shelf methods have some drawbacks to handle multiple domains. Lifelong Crowd Counting aims at alleviating the catastrophic forgetting and improving the generalization ability.
arXiv Detail & Related papers (2022-05-06T15:37:56Z)
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks. MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z)
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER) SEER is a simple modification of existing off-policy deep reinforcement learning methods. We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.