ACC-UNet: A Completely Convolutional UNet model for the 2020s
- URL: http://arxiv.org/abs/2308.13680v1
- Date: Fri, 25 Aug 2023 21:39:43 GMT
- Title: ACC-UNet: A Completely Convolutional UNet model for the 2020s
- Authors: Nabil Ibtehaz, Daisuke Kihara
- Abstract summary: ACC-UNet is a completely convolutional UNet model that brings the best of both worlds, the inherent inductive biases of convnets with the design decisions of transformers.
ACC-UNet was evaluated on 5 different medical image segmentation benchmarks and consistently outperformed convnets, transformers, and their hybrids.
- Score: 2.7013801448234367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This decade is marked by the introduction of Vision Transformer, a radical
paradigm shift in broad computer vision. A similar trend is followed in medical
imaging, UNet, one of the most influential architectures, has been redesigned
with transformers. Recently, the efficacy of convolutional models in vision is
being reinvestigated by seminal works such as ConvNext, which elevates a ResNet
to Swin Transformer level. Deriving inspiration from this, we aim to improve a
purely convolutional UNet model so that it can be on par with the
transformer-based models, e.g, Swin-Unet or UCTransNet. We examined several
advantages of the transformer-based UNet models, primarily long-range
dependencies and cross-level skip connections. We attempted to emulate them
through convolution operations and thus propose, ACC-UNet, a completely
convolutional UNet model that brings the best of both worlds, the inherent
inductive biases of convnets with the design decisions of transformers.
ACC-UNet was evaluated on 5 different medical image segmentation benchmarks and
consistently outperformed convnets, transformers, and their hybrids. Notably,
ACC-UNet outperforms state-of-the-art models Swin-Unet and UCTransNet by $2.64
\pm 2.54\%$ and $0.45 \pm 1.61\%$ in terms of dice score, respectively, while
using a fraction of their parameters ($59.26\%$ and $24.24\%$). Our codes are
available at https://github.com/kiharalab/ACC-UNet.
Related papers
- ACC-ViT : Atrous Convolution's Comeback in Vision Transformers [5.224344210588584]
We introduce Atrous Attention, a fusion of regional and sparse attention, which can adaptively consolidate both local and global information.
We also propose a general vision transformer backbone, named ACC-ViT, following conventional practices for standard vision tasks.
ACC-ViT is therefore a strong vision backbone, which is also competitive in mobile-scale versions, ideal for niche applications with small datasets.
arXiv Detail & Related papers (2024-03-07T04:05:16Z) - TransVG++: End-to-End Visual Grounding with Language Conditioned Vision
Transformer [188.00681648113223]
We explore neat yet effective Transformer-based frameworks for visual grounding.
TransVG establishes multi-modal correspondences by Transformers and localizes referred regions by directly regressing box coordinates.
We upgrade our framework to a purely Transformer-based one by leveraging Vision Transformer (ViT) for vision feature encoding.
arXiv Detail & Related papers (2022-06-14T06:27:38Z) - Adaptive Split-Fusion Transformer [90.04885335911729]
We propose an Adaptive Split-Fusion Transformer (ASF-former) to treat convolutional and attention branches differently with adaptive weights.
Experiments on standard benchmarks, such as ImageNet-1K, show that our ASF-former outperforms its CNN, transformer counterparts, and hybrid pilots in terms of accuracy.
arXiv Detail & Related papers (2022-04-26T10:00:28Z) - Joint rotational invariance and adversarial training of a dual-stream
Transformer yields state of the art Brain-Score for Area V4 [3.3504365823045044]
We show how a dual-stream Transformer, a CrossViT$textita la$ Chen et al. (2021), under a joint rotationally-invariant and adversarial optimization procedure yields 2nd place in the aggregate Brain-Score 2022 competition averaged across all visual categories.
Our current Transformer-based model also achieves greater explainable variance for areas V4, IT and Behaviour than a biologically-inspired CNN (ResNet50) that integrates a frontal V1-like module.
arXiv Detail & Related papers (2022-03-08T23:08:35Z) - A ConvNet for the 2020s [94.89735578018099]
Vision Transformers (ViTs) quickly superseded ConvNets as the state-of-the-art image classification model.
It is the hierarchical Transformers that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone.
In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve.
arXiv Detail & Related papers (2022-01-10T18:59:10Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - CMT: Convolutional Neural Networks Meet Vision Transformers [68.10025999594883]
Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image.
There are still gaps in both performance and computational cost between transformers and existing convolutional neural networks (CNNs)
We propose a new transformer based hybrid network by taking advantage of transformers to capture long-range dependencies, and of CNNs to model local features.
In particular, our CMT-S achieves 83.5% top-1 accuracy on ImageNet, while being 14x and 2x smaller on FLOPs than the existing DeiT and EfficientNet, respectively.
arXiv Detail & Related papers (2021-07-13T17:47:19Z) - Vision Transformers for Dense Prediction [77.34726150561087]
We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.
Our experiments show that this architecture yields substantial improvements on dense prediction tasks.
arXiv Detail & Related papers (2021-03-24T18:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.