$DA^3$: Deep Additive Attention Adaption for Memory-Efficient On-Device
Multi-Domain Learning
- URL: http://arxiv.org/abs/2012.01362v2
- Date: Mon, 29 Mar 2021 20:13:15 GMT
- Title: $DA^3$: Deep Additive Attention Adaption for Memory-Efficient On-Device
Multi-Domain Learning
- Authors: Li Yang, Adnan Siraj Rakin and Deliang Fan
- Abstract summary: Large memory used for activation storage is the bottleneck that largely limits the training time and cost on edge devices.
We propose Deep Additive Attention Adaption, a novel memory-efficient on-device multi-domain learning method.
We validate $DA3$ on multiple datasets against state-of-the-art methods, which shows good improvement in both accuracy and training time.
- Score: 30.53018068935323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays, one practical limitation of deep neural network (DNN) is its high
degree of specialization to a single task or domain (e.g., one visual domain).
It motivates researchers to develop algorithms that can adapt DNN model to
multiple domains sequentially, meanwhile still performing well on the past
domains, which is known as multi-domain learning. Almost all conventional
methods only focus on improving accuracy with minimal parameter update, while
ignoring high computing and memory cost during training, which makes it
impossible to deploy multi-domain learning into more and more widely used
resource-limited edge devices, like mobile phone, IoT, embedded system, etc.
During our study in multi-domain training, we observe that large memory used
for activation storage is the bottleneck that largely limits the training time
and cost on edge devices. To reduce training memory usage, while keeping the
domain adaption accuracy performance, in this work, we propose Deep Additive
Attention Adaption, a novel memory-efficient on-device multi-domain learning
method, aiming to achieve domain adaption on memory-limited edge devices. To
reduce the training memory consumption during on-device training, $DA^3$
freezes the weights of the pre-trained backbone model (i.e., no need to store
activation features during backward propagation). Furthermore, to improve the
adaption accuracy performance, we propose to improve the model capacity by
learning a novel additive attention adaptor module, which is also designed to
avoid activation memory buffering for improving memory efficiency. We validate
$DA^3$ on multiple datasets against state-of-the-art methods, which shows good
improvement in both accuracy and training time.
Related papers
- Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device
Learning [8.339901980070616]
Training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs)
We propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data.
We present a highly efficient on-device training engine named textitCAMEL, which leverages eDRAM as the primary on-chip memory.
arXiv Detail & Related papers (2023-05-04T20:57:01Z) - Efficient On-device Training via Gradient Filtering [14.484604762427717]
We propose a new gradient filtering approach which enables on-device CNN model training.
Our approach creates a special structure with fewer unique elements in the gradient map.
Our approach opens up a new direction of research with a huge potential for on-device training.
arXiv Detail & Related papers (2023-01-01T02:33:03Z) - Nesting Forward Automatic Differentiation for Memory-Efficient Deep
Neural Network Training [23.536294640280087]
We propose the nested forward automatic differentiation (Forward-AD) for the element-wise activation function for memory-efficient training.
Our evaluation shows that nested Forward-AD reduces the memory footprint up to 1.97x than the baseline model.
arXiv Detail & Related papers (2022-09-22T04:48:48Z) - POET: Training Neural Networks on Tiny Devices with Integrated
Rematerialization and Paging [35.397804171588476]
Fine-tuning models on edge devices would enable privacy-preserving personalization over sensitive data.
We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices.
arXiv Detail & Related papers (2022-07-15T18:36:29Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Forget Less, Count Better: A Domain-Incremental Self-Distillation
Learning Benchmark for Lifelong Crowd Counting [51.44987756859706]
Off-the-shelf methods have some drawbacks to handle multiple domains.
Lifelong Crowd Counting aims at alleviating the catastrophic forgetting and improving the generalization ability.
arXiv Detail & Related papers (2022-05-06T15:37:56Z) - Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks.
MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.