Fugu-MT 論文翻訳(概要): TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation

論文の概要: TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation

arxiv url: http://arxiv.org/abs/2208.00713v1
Date: Mon, 1 Aug 2022 09:53:53 GMT
ステータス: 翻訳完了
システム内更新日: 2022-08-02 13:23:11.295001
Title: TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation
Title（参考訳）: TransDeepLab: コンボリューションフリーなトランスフォーマーベースのDeepLab v3+
Authors: Reza Azad, Moein Heidari, Moein Shariatnia, Ehsan Khodapanah Aghdam, Sanaz Karimijafarbigloo, Ehsan Adeli, Dorit Merhof
Abstract要約: 本稿では,DeepLabライクな医用画像セグメンテーション用トランスフォーマであるTransDeepLabを提案する。我々は、DeepLabv3を拡張し、ASPPモジュールをモデル化するために、シフトウィンドウを持つ階層型Swin-Transformerを利用する。提案手法は,視覚変換器とCNNに基づく手法のアマルガメーションにおいて,現代のほとんどの作品に匹敵する,あるいは同等に動作する。
参考スコア（独自算出の注目度）: 11.190117191084175
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Convolutional neural networks (CNNs) have been the de facto standard in a diverse set of computer vision tasks for many years. Especially, deep neural networks based on seminal architectures such as U-shaped models with skip-connections or atrous convolution with pyramid pooling have been tailored to a wide range of medical image analysis tasks. The main advantage of such architectures is that they are prone to detaining versatile local features. However, as a general consensus, CNNs fail to capture long-range dependencies and spatial correlations due to the intrinsic property of confined receptive field size of convolution operations. Alternatively, Transformer, profiting from global information modelling that stems from the self-attention mechanism, has recently attained remarkable performance in natural language processing and computer vision. Nevertheless, previous studies prove that both local and global features are critical for a deep model in dense prediction, such as segmenting complicated structures with disparate shapes and configurations. To this end, this paper proposes TransDeepLab, a novel DeepLab-like pure Transformer for medical image segmentation. Specifically, we exploit hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and model the Atrous Spatial Pyramid Pooling (ASPP) module. A thorough search of the relevant literature yielded that we are the first to model the seminal DeepLab model with a pure Transformer-based model. Extensive experiments on various medical image segmentation tasks verify that our approach performs superior or on par with most contemporary works on an amalgamation of Vision Transformer and CNN-based methods, along with a significant reduction of model complexity. The codes and trained models are publicly available at https://github.com/rezazad68/transdeeplab
Abstract（参考訳）: 畳み込みニューラルネットワーク(CNN)は、長年にわたり様々なコンピュータビジョンタスクのデファクトスタンダードとなっている。特に、スキップ接続を持つU字型モデルやピラミッドプーリングを伴うアトラス畳み込みのようなセミナルアーキテクチャに基づくディープニューラルネットワークは、幅広い医療画像解析タスクに最適化されている。このようなアーキテクチャの主な利点は、汎用的なローカル機能を保持する傾向があることである。しかし、一般的なコンセンサスとして、cnnは畳み込み操作の限定受容場サイズの本質的性質のため、長距離依存性と空間相関を捉えることができない。あるいは、自己認識機構に由来するグローバル情報モデリングから利益を得るTransformerは、最近、自然言語処理とコンピュータビジョンにおいて顕著なパフォーマンスを達成した。それにもかかわらず、従来の研究では、局所的特徴と大域的特徴の両方が、異なる形状と構成を持つ複雑な構造を分割するなど、密集予測において深いモデルにとって重要であることが証明されている。そこで本研究では,医療画像セグメンテーションのための新しいトランスフォーマであるTransDeepLabを提案する。具体的には,deeplabv3の拡張とatrous spatial pyramid pooling (aspp)モジュールのモデル化のために,シフトウィンドウを用いた階層型スウィントランスフォーマを活用した。関連する文献を徹底的に検索した結果、私たちはまず、DeepLabモデルを純粋なTransformerベースのモデルでモデル化した。様々な医用画像分割タスクに関する広範囲な実験により、視覚トランスフォーマーとcnnベースの手法を融合した現代のほとんどの作品に匹敵する性能と、モデルの複雑さの大幅な低減が検証された。コードとトレーニングされたモデルはhttps://github.com/rezazad68/transdeeplabで公開されている。

論文の概要: TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation

関連論文リスト