Fugu-MT 論文翻訳(概要): Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space

論文の概要: Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space

arxiv url: http://arxiv.org/abs/2304.07254v1
Date: Thu, 13 Apr 2023 05:22:24 GMT
ステータス: 翻訳完了
システム内更新日: 2023-04-17 13:01:26.326864
Title: Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space
Title（参考訳）: Dynamic Mobile-Former:カーネル空間における注意と残留接続による動的畳み込みの強化
Authors: Seokju Yun, Youngmin Ro
Abstract要約: Dynamic Mobile-Formerは、効率的な演算子と調和させることで動的畳み込みの能力を最大化する。 PVT.A Transformer in Dynamic Mobile-Formerは、グローバルな機能をランダムに計算するだけである。 Dynamic MobileNetとTransformerのブリッジは、ローカル機能とグローバル機能の双方向統合を可能にする。
参考スコア（独自算出の注目度）: 4.111899441919165
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce Dynamic Mobile-Former(DMF), maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators.Our Dynamic MobileFormer effectively utilizes the advantages of Dynamic MobileNet (MobileNet equipped with dynamic convolution) using global information from light-weight attention.A Transformer in Dynamic Mobile-Former only requires a few randomly initialized tokens to calculate global features, making it computationally efficient.And a bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.We also simplify the optimization process of vanilla dynamic convolution by splitting the convolution kernel into an input-agnostic kernel and an input-dependent kernel.This allows for optimization in a wider kernel space, resulting in enhanced capacity.By integrating lightweight attention and enhanced dynamic convolution, our Dynamic Mobile-Former achieves not only high efficiency, but also strong performance.We benchmark the Dynamic Mobile-Former on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection, and instanace segmentation.For example, our DMF hits the top-1 accuracy of 79.4% on ImageNet-1K, much higher than PVT-Tiny by 4.3% with only 1/4 FLOPs.Additionally,our proposed DMF-S model performed well on challenging vision datasets such as COCO, achieving a 39.0% mAP,which is 1% higher than that of the Mobile-Former 508M model, despite using 3 GFLOPs less computations.Code and models are available at https://github.com/ysj9909/DMF
Abstract（参考訳）: We introduce Dynamic Mobile-Former(DMF), maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators.Our Dynamic MobileFormer effectively utilizes the advantages of Dynamic MobileNet (MobileNet equipped with dynamic convolution) using global information from light-weight attention.A Transformer in Dynamic Mobile-Former only requires a few randomly initialized tokens to calculate global features, making it computationally efficient.And a bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.We also simplify the optimization process of vanilla dynamic convolution by splitting the convolution kernel into an input-agnostic kernel and an input-dependent kernel.This allows for optimization in a wider kernel space, resulting in enhanced capacity.By integrating lightweight attention and enhanced dynamic convolution, our Dynamic Mobile-Former achieves not only high efficiency, but also strong performance.We benchmark the Dynamic Mobile-Former on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection, and instanace segmentation.For example, our DMF hits the top-1 accuracy of 79.4% on ImageNet-1K, much higher than PVT-Tiny by 4.3% with only 1/4 FLOPs.Additionally,our proposed DMF-S model performed well on challenging vision datasets such as COCO, achieving a 39.0% mAP,which is 1% higher than that of the Mobile-Former 508M model, despite using 3 GFLOPs less computations.Code and models are available at https://github.com/ysj9909/DMF

関連論文リスト

iFormer: Integrating ConvNet and Transformer for Mobile Application [0.6798775532273751]
iFormerは、畳み込みの高速局所表現能力と、自己意図の効率的なグローバルモデリング能力を統合する。我々は、iFormerが様々なタスクで既存の軽量ネットワークより優れていることを示す包括的な実験を行う。
論文参考訳（メタデータ） (2025-01-26T02:34:58Z)
EMOv2: Pushing 5M Vision Model Frontier [92.21687467702972]
様々な下流タスクにおいて,5M級軽量モデルの新たなフロンティアを構築した。我々の研究は、Transformerにおける効率的なIRBと実用的なコンポーネントの軽量なインフラを再考する。 4G/5G帯でモデルをダウンロードする場合のモバイルユーザの遅延を考慮し,5M程度の軽量モデルの性能上限について検討する。
論文参考訳（メタデータ） (2024-12-09T17:12:22Z)
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
従来の軽量モデルの研究は、主にCNNとTransformerベースの設計に重点を置いてきた。効率と性能のバランスをとるMobileMambaフレームワークを提案する。 MobileMambaはTop-1で83.6%を達成し、既存の最先端の手法を上回っている。
論文参考訳（メタデータ） (2024-11-24T18:01:05Z)
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [73.80247057590519]
ビジョントランスフォーマー(ViT)は、トークンミキサーの強力なグローバルコンテキスト能力によって、ニューラルネットワークの革命的な進歩を示す。 CAS-ViT: Convolutional Additive Self-attention Vision Transformerを導入し、モバイルアプリケーションにおける効率と性能のバランスを実現する。 ImageNet-1Kのパラメータは12M/21Mで83.0%/84.1%である。
論文参考訳（メタデータ） (2024-08-07T11:33:46Z)
KernelWarehouse: Rethinking the Design of Dynamic Convolution [16.101179962553385]
KernelWarehouseはカーネルの基本概念を再定義し、カーネルを組み立てる。我々は、さまざまなConvNetアーキテクチャを用いて、ImageNetおよびMS-COCOデータセットにおけるKernelWarehouseの有効性を検証した。
論文参考訳（メタデータ） (2024-06-12T05:16:26Z)
Efficient Modulation for Vision Networks [122.1051910402034]
我々は、効率的なビジョンネットワークのための新しい設計である効率的な変調を提案する。変調機構が特に効率的なネットワークに適していることを実証する。私たちのネットワークは、精度と効率のトレードオフをうまく達成できます。
論文参考訳（メタデータ） (2024-03-29T03:48:35Z)
SGDM: Static-Guided Dynamic Module Make Stronger Visual Models [0.9012198585960443]
空間的注意機構は、物体検出性能を改善するために広く利用されている。動的重み畳み込みの2つの欠陥に対処するため、Razor Dynamic Convolution (RDConv)を提案する。本稿では,高周波雑音に敏感な動的畳み込みの問題を解決するため,静的畳み込みにおける共有重み付け機構を提案する。
論文参考訳（メタデータ） (2024-03-27T06:18:40Z)
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications [108.44482683870888]
Deformable Convolution v4 (DCNv4) は、広帯域の視覚アプリケーション向けに設計された、高効率で効率的な演算子である。 DCNv4は、前任のDCNv3の制限に対処し、2つの重要な拡張を加えた。画像分類、インスタンスとセマンティックセグメンテーション、特に画像生成など、様々なタスクで例外的なパフォーマンスを示す。
論文参考訳（メタデータ） (2024-01-11T14:53:24Z)
Vision Transformer Computation and Resilience for Dynamic Inference [3.6929360462568077]
我々は、視覚変換器のレジリエンスを活用して、モデルの異なるスケールバージョンをプルーニングし、切り替える。ほとんどのFLOPは、注意ではなく、畳み込みによって生成される。いくつかのモデルは比較的弾力性があり、モデルの実行は再トレーニングせずに適応できる。
論文参考訳（メタデータ） (2022-12-06T01:10:31Z)
PAD-Net: An Efficient Framework for Dynamic Networks [72.85480289152719]
動的ネットワークを実装する際の一般的な実践は、与えられた静的レイヤを完全な動的レイヤに変換することである。我々は、冗長な動的パラメータを静的なパラメータに変換するために、部分的に動的ネットワーク、すなわちPAD-Netを提案する。提案手法は,2つの典型的な動的アーキテクチャを用いた大規模実験によって包括的に支持されている。
論文参考訳（メタデータ） (2022-11-10T12:42:43Z)
SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution [16.56592303409295]
動的畳み込みは、無視可能なFLOPの増加による効率の良いCNNの性能向上を実現する。我々はこれら2つのパスを自然に統合する新しいフレームワーク textbfSparse Dynamic Convolution (textscSD-Conv) を提案する。
論文参考訳（メタデータ） (2022-04-05T14:03:54Z)
DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers [105.74546828182834]
本稿では,様々な難易度を持つ入力に対して,ネットワークパラメータの一部を適応的にスライスする動的ウェイトスライシングという,ハードウェア効率のよい動的推論方式を示す。我々は、CNNのフィルタ数とCNNと変換器の多重次元を入力依存的に調整することで、動的スライム可能なネットワーク(DS-Net)と動的スライス可能なネットワーク(DS-Net++)を提案する。
論文参考訳（メタデータ） (2021-09-21T09:57:21Z)
Revisiting Dynamic Convolution via Matrix Decomposition [81.89967403872147]
チャネル群に対する動的注意を置き換える動的チャネル融合を提案する。本手法は訓練が容易で,精度を犠牲にすることなくパラメータを著しく削減する。
論文参考訳（メタデータ） (2021-03-15T23:03:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。