Fugu-MT 論文翻訳(概要): Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

論文の概要: Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

arxiv url: http://arxiv.org/abs/2208.06980v1
Date: Mon, 15 Aug 2022 02:47:33 GMT
ステータス: 翻訳完了
システム内更新日: 2022-08-16 14:38:00.844652
Title: Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers
Title（参考訳）: より高速な注意:二重凝縮型アテンションコンデンサによるエッジ用の高速自己注意ニューラルネットワークバックボーンアーキテクチャ
Authors: Alexander Wong, Mohammad Javad Shafiee, Saad Abbasi, Saeejith Nair, and Mahmoud Famouri
Abstract要約: 本稿では,2重対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向対向結果のバックボーン(AttendNeXtと呼ぶ)は、組み込みARMプロセッサ上で大幅に高い推論スループットを実現する。これらの有望な結果は、さまざまな効率的なアーキテクチャ設計と自己アテンション機構の探索が、TinyMLアプリケーションのための興味深い新しいビルディングブロックにつながることを実証している。
参考スコア（独自算出の注目度）: 71.40595908386477
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the growing adoption of deep learning for on-device TinyML applications, there has been an ever-increasing demand for more efficient neural network backbones optimized for the edge. Recently, the introduction of attention condenser networks have resulted in low-footprint, highly-efficient, self-attention neural networks that strike a strong balance between accuracy and speed. In this study, we introduce a new faster attention condenser design called double-condensing attention condensers that enable more condensed feature embedding. We further employ a machine-driven design exploration strategy that imposes best practices design constraints for greater efficiency and robustness to produce the macro-micro architecture constructs of the backbone. The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones (>10X faster than FB-Net C at higher accuracy and speed) while having a small model size (>1.47X smaller than OFA-62 at higher speed and similar accuracy) and strong accuracy (1.1% higher top-1 accuracy than MobileViT XS on ImageNet at higher speed). These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.
Abstract（参考訳）: デバイス上のtinymlアプリケーションにディープラーニングが採用されることで、エッジに最適化されたより効率的なニューラルネットワークバックボーンに対する需要がますます高まっている。近年,注目凝縮器ネットワークの導入により,精度と速度のバランスが強い低フットプリント,高効率,自己認識型ニューラルネットワークが実現されている。本研究では,より高速な注意凝縮器の設計である2重凝縮型注意凝縮器について紹介する。さらに、バックボーンのマクロマイクロアーキテクチャ構造を生成するために、より効率と堅牢性を高めるためのベストプラクティス設計制約を課す機械駆動設計探索戦略を採用する。結果として得られたバックボーン(AttendNeXt)は、他の最先端の効率的なバックボーン(FB-Net Cよりも高い精度と高速で10倍速い)と比較して、組み込みARMプロセッサでの推論スループットを著しく向上させると同時に、モデルサイズが小さい(高速で類似の精度でOFA-62より1.47倍小さい)とともに、強い精度(ImageNet上のMobileViT XSよりも1.1%高いトップ1精度)を持つ。これらの有望な結果から、異なる効率的なアーキテクチャ設計とセルフアテンションメカニズムの探求は、tinymlアプリケーションのための興味深い新しいビルディングブロックにつながることが示されている。

論文の概要: Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

関連論文リスト