More complex encoder is not all you need
- URL: http://arxiv.org/abs/2309.11139v3
- Date: Fri, 27 Oct 2023 13:45:02 GMT
- Title: More complex encoder is not all you need
- Authors: Weibin Yang, Longwei Xu, Pengwei Wang, Dehua Geng, Yusong Li, Mingyuan
Xu, Zhiqi Dong
- Abstract summary: We introduce neU-Net (i.e., not complex encoder U-Net), which incorporates a novel Sub-pixel Convolution for upsampling to construct a powerful decoder.
Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse and ACDC datasets.
- Score: 0.882348769487259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: U-Net and its variants have been widely used in medical image segmentation.
However, most current U-Net variants confine their improvement strategies to
building more complex encoder, while leaving the decoder unchanged or adopting
a simple symmetric structure. These approaches overlook the true functionality
of the decoder: receiving low-resolution feature maps from the encoder and
restoring feature map resolution and lost information through upsampling. As a
result, the decoder, especially its upsampling component, plays a crucial role
in enhancing segmentation outcomes. However, in 3D medical image segmentation,
the commonly used transposed convolution can result in visual artifacts. This
issue stems from the absence of direct relationship between adjacent pixels in
the output feature map. Furthermore, plain encoder has already possessed
sufficient feature extraction capability because downsampling operation leads
to the gradual expansion of the receptive field, but the loss of information
during downsampling process is unignorable. To address the gap in relevant
research, we extend our focus beyond the encoder and introduce neU-Net (i.e.,
not complex encoder U-Net), which incorporates a novel Sub-pixel Convolution
for upsampling to construct a powerful decoder. Additionally, we introduce
multi-scale wavelet inputs module on the encoder side to provide additional
information. Our model design achieves excellent results, surpassing other
state-of-the-art methods on both the Synapse and ACDC datasets.
Related papers
- Optimizing Medical Image Segmentation with Advanced Decoder Design [0.8402155549849591]
U-Net is widely used in medical image segmentation due to its simple and flexible architecture design.
We propose Swin DER (i.e., Swin UNETR Decoder Enhanced and Refined) by specifically optimizing the design of these three components.
Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse and the MSD brain tumor segmentation task.
arXiv Detail & Related papers (2024-10-05T11:47:13Z) - $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - MUSTER: A Multi-scale Transformer-based Decoder for Semantic Segmentation [19.83103856355554]
MUSTER is a transformer-based decoder that seamlessly integrates with hierarchical encoders.
MSKA units enable the fusion of multi-scale features from the encoder and decoder, facilitating comprehensive information integration.
On the challenging ADE20K dataset, our best model achieves a single-scale mIoU of 50.23 and a multi-scale mIoU of 51.88.
arXiv Detail & Related papers (2022-11-25T06:51:07Z) - SoftPool++: An Encoder-Decoder Network for Point Cloud Completion [93.54286830844134]
We propose a novel convolutional operator for the task of point cloud completion.
The proposed operator does not require any max-pooling or voxelization operation.
We show that our approach achieves state-of-the-art performance in shape completion at low and high resolutions.
arXiv Detail & Related papers (2022-05-08T15:31:36Z) - FusionCount: Efficient Crowd Counting via Multiscale Feature Fusion [36.15554768378944]
This paper proposes a novel crowd counting architecture (FusionCount)
It exploits the adaptive fusion of a large majority of encoded features instead of relying on additional extraction components to obtain multiscale features.
Experiments on two benchmark databases demonstrate that our model achieves state-of-the-art results with reduced computational complexity.
arXiv Detail & Related papers (2022-02-28T10:04:07Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Transformer Meets DCFAM: A Novel Semantic Segmentation Scheme for
Fine-Resolution Remote Sensing Images [6.171417925832851]
We introduce the Swin Transformer as the backbone to fully extract the context information.
We also design a novel decoder named densely connected feature aggregation module (DCFAM) to restore the resolution and generate the segmentation map.
arXiv Detail & Related papers (2021-04-25T11:34:22Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.