Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures
- URL: http://arxiv.org/abs/2303.16100v2
- Date: Wed, 12 Apr 2023 20:05:43 GMT
- Title: Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures
- Authors: Zirui Fu, Aleksandre Avaliani, Marco Donato
- Abstract summary: adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
- Score: 68.91874045918112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Executing machine learning inference tasks on resource-constrained edge
devices requires careful hardware-software co-design optimizations. Recent
examples have shown how transformer-based deep neural network models such as
ALBERT can be used to enable the execution of natural language processing (NLP)
inference on mobile systems-on-chip housing custom hardware accelerators.
However, while these existing solutions are effective in alleviating the
latency, energy, and area costs of running single NLP tasks, achieving
multi-task inference requires running computations over multiple variants of
the model parameters, which are tailored to each of the targeted tasks. This
approach leads to either prohibitive on-chip memory requirements or paying the
cost of off-chip memory access. This paper proposes adapter-ALBERT, an
efficient model optimization for maximal data reuse across different tasks. The
proposed model's performance and robustness to data compression methods are
evaluated across several language tasks from the GLUE benchmark. Additionally,
we demonstrate the advantage of mapping the model to a heterogeneous on-chip
memory architecture by performing simulations on a validated NLP edge
accelerator to extrapolate performance, power, and area improvements over the
execution of a traditional ALBERT model on the same hardware platform.
Related papers
- AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks.
AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation.
We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator [4.09225917049674]
Transferable NAS has emerged, generalizing the search process from dataset-dependent to task-dependent.
This paper introduces POMONAG, extending DiffusionNAG via a many-optimal diffusion process.
Results were validated on two search spaces -- NAS201 and MobileNetV3 -- and evaluated across 15 image classification datasets.
arXiv Detail & Related papers (2024-09-30T16:05:29Z) - Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge [35.40849522296486]
Large-scale foundation models (FoMos) can perform human-like intelligence.
FoMos need to be adapted to specialized downstream tasks through fine-tuning techniques.
We advocate multi-device cooperation within the device-edge cooperative fine-tuning paradigm.
arXiv Detail & Related papers (2024-07-13T12:47:14Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Cheaply Evaluating Inference Efficiency Metrics for Autoregressive
Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing.
LLMs are extremely computationally expensive, even at inference time.
We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z) - Attention-Based Model and Deep Reinforcement Learning for Distribution
of Event Processing Tasks [0.0]
Event processing is a cornerstone of the dynamic and responsive Internet of Things (IoT)
This article investigates the use of deep learning to fairly distribute the tasks.
An attention-based neural network model is proposed to generate efficient load balancing solutions.
arXiv Detail & Related papers (2021-12-07T17:16:35Z) - ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network
Design in FPGA-based Systems [4.2612881037640085]
This paper analyzes and ingathers the efficacy of the Posit number representation scheme and the efficiency of fixed-point arithmetic implementations for ANNs.
We propose a novel Posit to fixed-point converter for enabling high-performance and energy-efficient hardware implementations for ANNs.
arXiv Detail & Related papers (2020-10-24T11:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.