論文の概要: Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture
with Task-level Sparsity via Mixture-of-Experts
- arxiv url: http://arxiv.org/abs/2305.18691v2
- Date: Wed, 13 Sep 2023 16:52:55 GMT
- ステータス: 処理完了
- システム内更新日: 2023-09-14 18:02:37.389436
- Title: Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture
with Task-level Sparsity via Mixture-of-Experts
- Title(参考訳): Edge-MoE:Mixture-of-Expertsによるタスクレベルの分散性を備えたメモリ効率の良いマルチタスクビジョントランスフォーマアーキテクチャ
- Authors: Rishov Sarkar, Hanxue Liang, Zhiwen Fan, Zhangyang Wang, Cong Hao
- Abstract要約: M$3$ViTは、Mix-of-experts (MoE)を導入した最新のマルチタスクViTモデルである。
- 参考スコア(独自算出の注目度): 60.1586169973792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computer vision researchers are embracing two promising paradigms: Vision
Transformers (ViTs) and Multi-task Learning (MTL), which both show great
performance but are computation-intensive, given the quadratic complexity of
self-attention in ViT and the need to activate an entire large MTL model for
one task. M$^3$ViT is the latest multi-task ViT model that introduces
mixture-of-experts (MoE), where only a small portion of subnetworks ("experts")
are sparsely and dynamically activated based on the current task. M$^3$ViT
achieves better accuracy and over 80% computation reduction but leaves
challenges for efficient deployment on FPGA.
Our work, dubbed Edge-MoE, solves the challenges to introduce the first
end-to-end FPGA accelerator for multi-task ViT with a collection of
architectural innovations, including (1) a novel reordering mechanism for
self-attention, which requires only constant bandwidth regardless of the target
parallelism; (2) a fast single-pass softmax approximation; (3) an accurate and
low-cost GELU approximation; (4) a unified and flexible computing unit that is
shared by almost all computational layers to maximally reduce resource usage;
and (5) uniquely for M$^3$ViT, a novel patch reordering method to eliminate
memory access overhead. Edge-MoE achieves 2.24x and 4.90x better energy
efficiency comparing with GPU and CPU, respectively. A real-time video
demonstration is available online, along with our open-source code written
using High-Level Synthesis.
- Abstract(参考訳): ビジョントランスフォーマー(ViT)とマルチタスク学習(MTL)はどちらも優れた性能を示すが、ViTにおける自己注意の二次的な複雑さと、ひとつのタスクで大規模なMTLモデルを活性化する必要があることを考えると、計算集約性が高い。
M$^3$ViT は最新のマルチタスク ViT モデルで、ME(Mix-of-Experts)を導入している。
Our work, dubbed Edge-MoE, solves the challenges to introduce the first end-to-end FPGA accelerator for multi-task ViT with a collection of architectural innovations, including (1) a novel reordering mechanism for self-attention, which requires only constant bandwidth regardless of the target parallelism; (2) a fast single-pass softmax approximation; (3) an accurate and low-cost GELU approximation; (4) a unified and flexible computing unit that is shared by almost all computational layers to maximally reduce resource usage; and (5) uniquely for M$^3$ViT, a novel patch reordering method to eliminate memory access overhead.
High-Level Synthesisを使って書かれたオープンソースコードとともに、リアルタイムのビデオデモがオンラインで公開されている。
- CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [73.80247057590519]
CAS-ViT: Convolutional Additive Self-attention Vision Transformerを導入し、モバイルアプリケーションにおける効率と性能のバランスを実現する。
論文 参考訳(メタデータ) (2024-08-07T11:33:46Z) - CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference [4.523939613157408]
論文 参考訳(メタデータ) (2024-07-17T16:56:06Z) - Deformable Mixer Transformer with Gating for Multi-Task Learning of
Dense Prediction [126.34551436845133]
CNNとTransformerには独自の利点があり、MTL(Multi-task Learning)の高密度予測に広く使われている。
論文 参考訳(メタデータ) (2023-08-10T17:37:49Z) - Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for
Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
Fast GraspNeXtは、ロボットグルーピングのためのコンピュータビジョンタスクに埋め込まれたマルチタスク学習に適した、高速な自己認識型ニューラルネットワークアーキテクチャである。
論文 参考訳(メタデータ) (2023-04-21T18:07:14Z) - AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task
Learning [1.4963011898406864]
Vuzix M4000 スマートグラス上に展開すると、AdaMTL は推論遅延とエネルギー消費をそれぞれ 21.8% と 37.5% に削減する。
論文 参考訳(メタデータ) (2023-04-17T20:17:44Z) - M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
Learning with Model-Accelerator Co-design [95.41238363769892]
論文 参考訳(メタデータ) (2022-10-26T15:40:24Z) - Pruning Self-attentions into Convolutional Layers in Single Path [89.55361659622305]
トレーニング済みのViTを効率よく自動圧縮するSPViT(Single-Path Vision Transformer pruning)を提案する。
論文 参考訳(メタデータ) (2021-11-23T11:35:54Z)