Focused Decoding Enables 3D Anatomical Detection by Transformers
- URL: http://arxiv.org/abs/2207.10774v1
- Date: Thu, 21 Jul 2022 22:17:21 GMT
- Title: Focused Decoding Enables 3D Anatomical Detection by Transformers
- Authors: Bastian Wittmann, Fernando Navarro, Suprosanna Shit, Bjoern Menze
- Abstract summary: We propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder.
Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view.
We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights.
- Score: 64.36530874341666
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Detection Transformers represent end-to-end object detection approaches based
on a Transformer encoder-decoder architecture, exploiting the attention
mechanism for global relation modeling. Although Detection Transformers deliver
results on par with or even superior to their highly optimized CNN-based
counterparts operating on 2D natural images, their success is closely coupled
to access to a vast amount of training data. This, however, restricts the
feasibility of employing Detection Transformers in the medical domain, as
access to annotated data is typically limited. To tackle this issue and
facilitate the advent of medical Detection Transformers, we propose a novel
Detection Transformer for 3D anatomical structure detection, dubbed Focused
Decoder. Focused Decoder leverages information from an anatomical region atlas
to simultaneously deploy query anchors and restrict the cross-attention's field
of view to regions of interest, which allows for a precise focus on relevant
anatomical structures. We evaluate our proposed approach on two publicly
available CT datasets and demonstrate that Focused Decoder not only provides
strong detection results and thus alleviates the need for a vast amount of
annotated data but also exhibits exceptional and highly intuitive
explainability of results via attention weights. Code for Focused Decoder is
available in our medical Vision Transformer library
github.com/bwittmann/transoar.
Related papers
- Rethinking Attention Gated with Hybrid Dual Pyramid Transformer-CNN for Generalized Segmentation in Medical Imaging [17.07490339960335]
We introduce a novel hybrid CNN-Transformer segmentation architecture (PAG-TransYnet) designed for efficiently building a strong CNN-Transformer encoder.
Our approach exploits attention gates within a Dual Pyramid hybrid encoder.
arXiv Detail & Related papers (2024-04-28T14:37:10Z) - ParaTransCNN: Parallelized TransCNN Encoder for Medical Image
Segmentation [7.955518153976858]
We propose an advanced 2D feature extraction method by combining the convolutional neural network and Transformer architectures.
Our method is shown with better segmentation accuracy, especially on small organs.
arXiv Detail & Related papers (2024-01-27T05:58:36Z) - 3D TransUNet: Advancing Medical Image Segmentation through Vision
Transformers [40.21263511313524]
Medical image segmentation plays a crucial role in advancing healthcare systems for disease diagnosis and treatment planning.
The u-shaped architecture, popularly known as U-Net, has proven highly successful for various medical image segmentation tasks.
To address these limitations, researchers have turned to Transformers, renowned for their global self-attention mechanisms.
arXiv Detail & Related papers (2023-10-11T18:07:19Z) - Dilated-UNet: A Fast and Accurate Medical Image Segmentation Approach
using a Dilated Transformer and U-Net Architecture [0.6445605125467572]
This paper introduces Dilated-UNet, which combines a Dilated Transformer block with the U-Net architecture for accurate and fast medical image segmentation.
The results of our experiments show that Dilated-UNet outperforms other models on several challenging medical image segmentation datasets.
arXiv Detail & Related papers (2023-04-22T17:20:13Z) - View-Disentangled Transformer for Brain Lesion Detection [50.4918615815066]
We propose a novel view-disentangled transformer to enhance the extraction of MRI features for more accurate tumour detection.
First, the proposed transformer harvests long-range correlation among different positions in a 3D brain scan.
Second, the transformer models a stack of slice features as multiple 2D views and enhance these features view-by-view.
Third, we deploy the proposed transformer module in a transformer backbone, which can effectively detect the 2D regions surrounding brain lesions.
arXiv Detail & Related papers (2022-09-20T11:58:23Z) - Integral Migrating Pre-trained Transformer Encoder-decoders for Visual
Object Detection [78.2325219839805]
imTED improves the state-of-the-art of few-shot object detection by up to 7.6% AP.
Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by 2.8%.
arXiv Detail & Related papers (2022-05-19T15:11:20Z) - Atrous Residual Interconnected Encoder to Attention Decoder Framework
for Vertebrae Segmentation via 3D Volumetric CT Images [1.8146155083014204]
This paper proposes a novel algorithm for automated vertebrae segmentation via 3D volumetric spine CT images.
The proposed model is based on the structure of encoder to decoder, using layer normalization to optimize mini-batch training performance.
The experimental results show that our model achieves competitive performance compared with other state-of-the-art medical semantic segmentation methods.
arXiv Detail & Related papers (2021-04-08T12:09:16Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z) - Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection
in Autonomous Driving [121.44554957537613]
We propose a new transformer, called Temporal-Channel Transformer, to model the spatial-temporal domain and channel domain relationships for video object detecting from Lidar data.
Specifically, the temporal-channel encoder of the transformer is designed to encode the information of different channels and frames.
We achieve the state-of-the-art performance in grid voxel-based 3D object detection on the nuScenes benchmark.
arXiv Detail & Related papers (2020-11-27T09:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.