Fugu-MT 論文翻訳(概要): Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts

論文の概要: Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts

arxiv url: http://arxiv.org/abs/2305.18691v2
Date: Wed, 13 Sep 2023 16:52:55 GMT
ステータス: 翻訳完了
システム内更新日: 2023-09-14 18:02:37.389436
Title: Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts
Title（参考訳）: Edge-MoE:Mixture-of-Expertsによるタスクレベルの分散性を備えたメモリ効率の良いマルチタスクビジョントランスフォーマアーキテクチャ
Authors: Rishov Sarkar, Hanxue Liang, Zhiwen Fan, Zhangyang Wang, Cong Hao
Abstract要約: M$3$ViTは、Mix-of-experts (MoE)を導入した最新のマルチタスクViTモデルである。 MoEは精度の向上と80%以上の削減計算を実現しているが、FPGAに効率的なデプロイを行う上での課題は残されている。 Edge-MoEと呼ばれる私たちの研究は、アーキテクチャの革新の集合を伴って、マルチタスクのViTのための最初のエンドツーエンドFPGAアクセラレータを導入するという課題を解決します。
参考スコア（独自算出の注目度）: 60.1586169973792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computer vision researchers are embracing two promising paradigms: Vision Transformers (ViTs) and Multi-task Learning (MTL), which both show great performance but are computation-intensive, given the quadratic complexity of self-attention in ViT and the need to activate an entire large MTL model for one task. M$^3$ViT is the latest multi-task ViT model that introduces mixture-of-experts (MoE), where only a small portion of subnetworks ("experts") are sparsely and dynamically activated based on the current task. M$^3$ViT achieves better accuracy and over 80% computation reduction but leaves challenges for efficient deployment on FPGA. Our work, dubbed Edge-MoE, solves the challenges to introduce the first end-to-end FPGA accelerator for multi-task ViT with a collection of architectural innovations, including (1) a novel reordering mechanism for self-attention, which requires only constant bandwidth regardless of the target parallelism; (2) a fast single-pass softmax approximation; (3) an accurate and low-cost GELU approximation; (4) a unified and flexible computing unit that is shared by almost all computational layers to maximally reduce resource usage; and (5) uniquely for M$^3$ViT, a novel patch reordering method to eliminate memory access overhead. Edge-MoE achieves 2.24x and 4.90x better energy efficiency comparing with GPU and CPU, respectively. A real-time video demonstration is available online, along with our open-source code written using High-Level Synthesis.
Abstract（参考訳）: ビジョントランスフォーマー(ViT)とマルチタスク学習(MTL)はどちらも優れた性能を示すが、ViTにおける自己注意の二次的な複雑さと、ひとつのタスクで大規模なMTLモデルを活性化する必要があることを考えると、計算集約性が高い。 M$^3$ViT は最新のマルチタスク ViT モデルで、ME(Mix-of-Experts)を導入している。 M$3$ViTは精度の向上と80%以上の計算削減を実現しているが、FPGA上での効率的なデプロイには課題を残している。 Our work, dubbed Edge-MoE, solves the challenges to introduce the first end-to-end FPGA accelerator for multi-task ViT with a collection of architectural innovations, including (1) a novel reordering mechanism for self-attention, which requires only constant bandwidth regardless of the target parallelism; (2) a fast single-pass softmax approximation; (3) an accurate and low-cost GELU approximation; (4) a unified and flexible computing unit that is shared by almost all computational layers to maximally reduce resource usage; and (5) uniquely for M$^3$ViT, a novel patch reordering method to eliminate memory access overhead. edge-moeはgpuとcpuと比較して2.24倍と4.90倍のエネルギー効率を実現している。 High-Level Synthesisを使って書かれたオープンソースコードとともに、リアルタイムのビデオデモがオンラインで公開されている。

関連論文リスト

CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [73.80247057590519]
ビジョントランスフォーマー(ViT)は、トークンミキサーの強力なグローバルコンテキスト能力によって、ニューラルネットワークの革命的な進歩を示す。 CAS-ViT: Convolutional Additive Self-attention Vision Transformerを導入し、モバイルアプリケーションにおける効率と性能のバランスを実現する。 ImageNet-1Kのパラメータは12M/21Mで83.0%/84.1%である。
論文参考訳（メタデータ） (2024-08-07T11:33:46Z)
CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference [4.523939613157408]
ビジョントランスフォーマー(ViT)は、コンピュータビジョンへの機械学習アプローチにおける画期的なシフトである。本稿では,これらの課題に対処するソフトウェアハードウェアの共同設計フレームワークであるCHOSENを紹介し,FPGA上にViTをデプロイするための自動フレームワークを提供する。 ChoSENはDeiT-SとDeiT-Bモデルのスループットを1.5倍と1.42倍改善した。
論文参考訳（メタデータ） (2024-07-17T16:56:06Z)
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads [10.169639612525643]
視覚知覚タスクは、その有効性にもかかわらず、主にViTによって解決される。その効果にもかかわらず、ViTは自己注意の計算の複雑さのために計算のボトルネックに直面している。構築した自己意識を近似するFibottention Architectureを提案する。
論文参考訳（メタデータ） (2024-06-27T17:59:40Z)
Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction [126.34551436845133]
CNNとTransformerには独自の利点があり、MTL(Multi-task Learning)の高密度予測に広く使われている。本稿では,変形可能なCNNと問合せベースのTransformerの長所を共用したMTLモデルを提案する。
論文参考訳（メタデータ） (2023-08-10T17:37:49Z)
Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
アーキテクチャと計算の複雑さが高いと、組み込みデバイスへのデプロイに適さない。 Fast GraspNeXtは、ロボットグルーピングのためのコンピュータビジョンタスクに埋め込まれたマルチタスク学習に適した、高速な自己認識型ニューラルネットワークアーキテクチャである。
論文参考訳（メタデータ） (2023-04-21T18:07:14Z)
AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning [1.4963011898406864]
マルチタスク学習モデルのためのタスク認識推論ポリシーを学習する適応型フレームワークであるAdaMTLを紹介する。 AdaMTLは計算複雑性を43%削減し、シングルタスクモデルと比較して精度を1.32%改善した。 Vuzix M4000 スマートグラス上に展開すると、AdaMTL は推論遅延とエネルギー消費をそれぞれ 21.8% と 37.5% に削減する。
論文参考訳（メタデータ） (2023-04-17T20:17:44Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
Adapter-ALBERTは、様々なタスクにわたる最大データ再利用のための効率的なモデル最適化である。検証されたNLPエッジアクセラレータ上でシミュレーションを行うことにより、モデルを不均一なオンチップメモリアーキテクチャにマッピングする利点を実証する。
論文参考訳（メタデータ） (2023-03-25T14:40:59Z)
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design [95.41238363769892]
マルチタスク学習(MTL)は、複数の学習タスクを単一のモデルにカプセル化し、それらのタスクを共同でよりよく学習できるようにする。現在のMTLレギュレータは、1つのタスクだけを実行するためにさえ、ほぼすべてのモデルを起動する必要がある。効率的なオンデバイスMTLを実現するためのモデル-アクセラレータ共設計フレームワークを提案する。
論文参考訳（メタデータ） (2022-10-26T15:40:24Z)
Pruning Self-attentions into Convolutional Layers in Single Path [89.55361659622305]
ビジョントランスフォーマー(ViT)は、様々なコンピュータビジョンタスクに対して印象的なパフォーマンスを実現している。トレーニング済みのViTを効率よく自動圧縮するSPViT(Single-Path Vision Transformer pruning)を提案する。われわれのSPViTはDeiT-Bで52.0%のFLOPをトリミングできる。
論文参考訳（メタデータ） (2021-11-23T11:35:54Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。