Fugu-MT 論文翻訳(概要): Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention

論文の概要: Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention

arxiv url: http://arxiv.org/abs/2203.03937v2
Date: Wed, 9 Mar 2022 10:07:51 GMT
ステータス: 翻訳完了
システム内更新日: 2022-03-10 12:20:07.972426
Title: Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention
Title（参考訳）: Dynamic Group Transformer: Dynamic Group Attention を備えた汎用視覚変換器バックボーン
Authors: Kai Liu, Tianyi Wu, Cong Liu, Guodong Guo
Abstract要約: 我々はDGT(Dynamic Group Transformer)という視覚変換器のバックボーンを開発する。我々のモデルは、画像分類、セマンティックセグメンテーション、オブジェクト検出、インスタンスセグメンテーションなど、複数の共通ビジョンタスクにおける最先端の手法よりも優れている。
参考スコア（独自算出の注目度）: 39.49147625797075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Transformers have shown promising performance in various vision tasks. To reduce the quadratic computation complexity caused by each query attending to all keys/values, various methods have constrained the range of attention within local regions, where each query only attends to keys/values within a hand-crafted window. However, these hand-crafted window partition mechanisms are data-agnostic and ignore their input content, so it is likely that one query maybe attends to irrelevant keys/values. To address this issue, we propose a Dynamic Group Attention (DG-Attention), which dynamically divides all queries into multiple groups and selects the most relevant keys/values for each group. Our DG-Attention can flexibly model more relevant dependencies without any spatial constraint that is used in hand-crafted window based attention. Built on the DG-Attention, we develop a general vision transformer backbone named Dynamic Group Transformer (DGT). Extensive experiments show that our models can outperform the state-of-the-art methods on multiple common vision tasks, including image classification, semantic segmentation, object detection, and instance segmentation.
Abstract（参考訳）: 近年、トランスフォーマーは様々な視覚タスクにおいて有望な性能を示している。各クエリがすべてのキー/値に従属することによる二次計算の複雑さを低減するため、各クエリが手作りウィンドウ内のキー/値にのみ従うローカル領域内の注意の範囲を様々な方法で制限した。しかし、これらの手作りウィンドウ分割機構は、データに依存しず、入力内容を無視しているため、あるクエリが無関係なキー/値に対応する可能性がある。本稿では,すべての問合せを複数のグループに動的に分割し,各グループに対して最も関連するキー/値を選択する動的グループアテンション(dgアテンション)を提案する。我々のDG-Attentionは、手作りウィンドウベースの注意に使用される空間的制約なしに、柔軟により関連する依存関係をモデル化できる。 dg-attentionを基盤として,dynamic group transformer (dgt) という一般ビジョントランスフォーマーを開発した。画像分類,セマンティックセグメンテーション,オブジェクト検出,インスタンスセグメンテーションなど,複数の共通ビジョンタスクにおいて,我々のモデルが最先端の手法より優れていることを示す。

関連論文リスト

DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention [11.338273151173427]
本稿では,鍵値ペアの選択を最適化するために,変形可能な2レベルルーティング注意(DBRA)モジュールを提案する。そこで我々は,新しい汎用視覚変換器であるDeformable Bi-level Routing Attention Transformer (DeBiFormer)を紹介する。 DeBiFormerは、画像分類、オブジェクト検出、セマンティックセグメンテーションなど、さまざまなコンピュータビジョンタスクで検証されている。
論文参考訳（メタデータ） (2024-10-11T07:23:10Z)
Scale Disparity of Instances in Interactive Point Cloud Segmentation [15.865365305312174]
我々はClickFormerを提案する。ClickFormerは革新的なインタラクティブなポイントクラウドセグメンテーションモデルで、物と物の両方のインスタンスを正確にセグメンテーションする。我々は、偽陽性の発生リスクを軽減するために、クエリ・ボクセル変換器にグローバルな注意を払っている。 ClickFormerは、屋内と屋外の両方のデータセットで、既存のインタラクティブなポイントクラウドセグメンテーションメソッドよりも優れています。
論文参考訳（メタデータ） (2024-07-19T03:45:48Z)
Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization [61.64304227831361]
単一ドメインの一般化は、単一のソースドメインデータからモデルを学び、他の見えないターゲットドメイン上での一般的なパフォーマンスを達成することを目的としている。本稿では,画像の複雑さの変化に対応することを目的とした,素早い学習に基づく動的物体中心知覚ネットワークを提案する。
論文参考訳（メタデータ） (2024-02-28T16:16:51Z)
OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Segは、タスク固有のクエリと出力を持つトランスフォーマーベースのエンコーダデコーダアーキテクチャである。 OMG-Segは10以上の異なるセグメンテーションタスクをサポートできるが、計算とパラメータのオーバーヘッドを大幅に削減できることを示す。
論文参考訳（メタデータ） (2024-01-18T18:59:34Z)
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention [87.41016963608067]
Deformable Attention Transformer (DAT++)を提案する。 DAT++は、85.9%のImageNet精度、54.5および47.0のMS-COCOインスタンスセグメンテーションmAP、51.5のADE20KセマンティックセグメンテーションmIoUで、様々なビジュアル認識ベンチマークで最先端の結果を達成している。
論文参考訳（メタデータ） (2023-09-04T08:26:47Z)
Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation [37.24532930188581]
トランスフォーマーに基づく検出とセグメンテーション方法は、学習された検出クエリのリストを使用して、トランスフォーマーネットワークから情報を取得する。学習したクエリの無作為な凸の組み合わせは、まだ対応するモデルに相応しいことを実証的に見出した。本稿では,画像の高レベルな意味論に基づいて,動的係数と凸の組み合わせを学習することを提案する。
論文参考訳（メタデータ） (2023-07-23T06:26:27Z)
HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation [113.6560373226501]
本研究は領域一般化設定の下で意味的セグメンテーションを研究する。本稿では,階層型グループ化変換器(HGFormer)を提案する。実験により、HGFormerはピクセルごとの分類法やフラットグルーピング変換器よりも、より堅牢なセマンティックセグメンテーション結果が得られることが示された。
論文参考訳（メタデータ） (2023-05-22T13:33:41Z)
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation [25.689520892609213]
本稿では,高分解能特徴を持つ一般視覚認識のための新しい非階層型トランスフォーマーモデルを提案する。画像分類,セマンティックセグメンテーション,オブジェクト検出,インスタンスセグメンテーションなど,さまざまな視覚的タスクにおけるGPViTの評価を行った。
論文参考訳（メタデータ） (2022-12-13T18:26:00Z)
Vision Transformer with Deformable Attention [29.935891419574602]
大規模な、時としてグローバルな受信フィールドは、CNNモデルよりも高い表現力を持つTransformerモデルを提供する。本稿では,キーと値ペアの位置をデータ依存的に選択する,変形可能な新しい自己保持モジュールを提案する。画像分類と重み付き予測の両方に変形性を考慮した一般的なバックボーンモデルであるDeformable Attention Transformerを提案する。
論文参考訳（メタデータ） (2022-01-03T08:29:01Z)
ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings [54.33327082243022]
ClusterVOはステレオビジュアルオドメトリーで、エゴと周囲の固いクラスタ/オブジェクトの両方の動きを同時にクラスタし、推定する。以前のソリューションでは、バッチ入力やシーン構造や動的オブジェクトモデルへの事前の指示に頼っていたが、ClusterVOは一般的にオンラインであり、屋内のシーン理解や自律運転など、さまざまなシナリオで使用することができる。
論文参考訳（メタデータ） (2020-03-29T09:06:28Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。