Memory transformers for full context and high-resolution 3D Medical
Segmentation
- URL: http://arxiv.org/abs/2210.05313v1
- Date: Tue, 11 Oct 2022 10:11:05 GMT
- Title: Memory transformers for full context and high-resolution 3D Medical
Segmentation
- Authors: Loic Themyr, Cl\'ement Rambour, Nicolas Thome, Toby Collins, Alexandre
Hostettler
- Abstract summary: This paper introduces the Full resolutIoN mEmory (FINE) transformer to overcome this issue.
The core idea behind FINE is to learn memory tokens to indirectly model full range interactions.
Experiments on the BCV image segmentation dataset shows better performances than state-of-the-art CNN and transformer baselines.
- Score: 76.93387214103863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer models achieve state-of-the-art results for image segmentation.
However, achieving long-range attention, necessary to capture global context,
with high-resolution 3D images is a fundamental challenge. This paper
introduces the Full resolutIoN mEmory (FINE) transformer to overcome this
issue. The core idea behind FINE is to learn memory tokens to indirectly model
full range interactions while scaling well in both memory and computational
costs. FINE introduces memory tokens at two levels: the first one allows full
interaction between voxels within local image regions (patches), the second one
allows full interactions between all regions of the 3D volume. Combined, they
allow full attention over high resolution images, e.g. 512 x 512 x 256 voxels
and above. Experiments on the BCV image segmentation dataset shows better
performances than state-of-the-art CNN and transformer baselines, highlighting
the superiority of our full attention mechanism compared to recent transformer
baselines, e.g. CoTr, and nnFormer.
Related papers
- SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation [0.13654846342364302]
We present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features.
SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features.
We benchmark SegFormer3D against the current SOTA models on three widely used datasets.
arXiv Detail & Related papers (2024-04-15T22:12:05Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Memory-efficient Segmentation of High-resolution Volumetric MicroCT
Images [11.723370840090453]
We propose a memory-efficient network architecture for 3D high-resolution image segmentation.
The network incorporates both global and local features via a two-stage U-net-based cascaded framework.
Experiments show that it outperforms state-of-the-art 3D segmentation methods in terms of both segmentation accuracy and memory efficiency.
arXiv Detail & Related papers (2022-05-31T16:42:48Z) - AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation [19.53151547706724]
transformer-based models have drawn attention to exploring these techniques in medical image segmentation.
We propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.
It has fewer parameters and takes less GPU memory to train than the previous transformer-based models.
arXiv Detail & Related papers (2021-10-20T06:47:28Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - Visual Transformers: Token-based Image Representation and Processing for
Computer Vision [67.55770209540306]
Visual Transformer ( VT) operates in a semantic token space, judiciously attending to different image parts based on context.
Using an advanced training recipe, our VTs significantly outperform their convolutional counterparts.
For semantic segmentation on LIP and COCO-stuff, VT-based feature pyramid networks (FPN) achieve 0.35 points higher mIoU while reducing the FPN module's FLOPs by 6.5x.
arXiv Detail & Related papers (2020-06-05T20:49:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.