DTC: A Deformable Transposed Convolution Module for Medical Image Segmentation
- URL: http://arxiv.org/abs/2601.17939v1
- Date: Sun, 25 Jan 2026 18:33:51 GMT
- Title: DTC: A Deformable Transposed Convolution Module for Medical Image Segmentation
- Authors: Chengkun Sun, Jinqian Pan, Renjie Liang, Zhengkang Fan, Xin Miao, Jiang Bian, Jie Xu,
- Abstract summary: We propose a novel upsampling method, Deformable Transposed Convolution (DTC), which learns dynamic coordinates to generate high-resolution feature maps.<n>DTC can be effectively integrated into existing medical image segmentation models, consistently improving the decoder's feature reconstruction and detail recovery capability.
- Score: 16.78785887135973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In medical image segmentation, particularly in UNet-like architectures, upsampling is primarily used to transform smaller feature maps into larger ones, enabling feature fusion between encoder and decoder features and supporting multi-scale prediction. Conventional upsampling methods, such as transposed convolution and linear interpolation, operate on fixed positions: transposed convolution applies kernel elements to predetermined pixel or voxel locations, while linear interpolation assigns values based on fixed coordinates in the original feature map. These fixed-position approaches may fail to capture structural information beyond predefined sampling positions and can lead to artifacts or loss of detail. Inspired by deformable convolutions, we propose a novel upsampling method, Deformable Transposed Convolution (DTC), which learns dynamic coordinates (i.e., sampling positions) to generate high-resolution feature maps for both 2D and 3D medical image segmentation tasks. Experiments on 3D (e.g., BTCV15) and 2D datasets (e.g., ISIC18, BUSI) demonstrate that DTC can be effectively integrated into existing medical image segmentation models, consistently improving the decoder's feature reconstruction and detail recovery capability.
Related papers
- Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding [86.55824709875598]
We propose a joint enhancement framework for 3D semantic Gaussian modeling that synergizes both semantic and rendering branches.<n>Unlike conventional point cloud shape encoding, we introduce an anisotropic 3D Gaussian Chebyshev descriptor to capture fine-grained 3D shape details.<n>We employ a cross-scene knowledge transfer module to continuously update learned shape patterns, enabling faster convergence and robust representations.
arXiv Detail & Related papers (2026-01-05T18:33:50Z) - Resolution-Independent Neural Operators for Multi-Rate Sparse-View CT [67.14700058302016]
Deep learning methods achieve high-fidelity reconstructions but often overfit to a fixed acquisition setup.<n>We propose Computed Tomography neural Operator (CTO), a unified CT reconstruction framework that extends to continuous function space.<n>CTO enables consistent multi-sampling-rate and cross-resolution performance, with on average >4dB PSNR gain over CNNs.
arXiv Detail & Related papers (2025-12-13T08:31:46Z) - MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentation [10.17757702972375]
A novel momentum encoder-based inter-slice fusion transformer (MOSformer) is proposed to overcome this issue.<n>MOSformer is evaluated on three benchmark datasets (Synapse, ACDC, and AMOS), achieving a new state-of-the-art with 85.63%, 92.19%, and 85.43% DSC, respectively.
arXiv Detail & Related papers (2024-01-22T11:25:59Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - MIST: Medical Image Segmentation Transformer with Convolutional
Attention Mixing (CAM) Decoder [0.0]
We propose a Medical Image Transformer (MIST) incorporating a novel Convolutional Attention Mixing (CAM) decoder.
MIST has two parts: a pre-trained multi-axis vision transformer (MaxViT) is used as an encoder, and the encoded feature representation is passed through the CAM decoder for segmenting the images.
To enhance spatial information gain, deep and shallow convolutions are used for feature extraction and receptive field expansion.
arXiv Detail & Related papers (2023-10-30T18:07:57Z) - Deformably-Scaled Transposed Convolution [17.4596321623511]
We revisit transposed convolution and introduce a novel layer that allows us to place information in the image selectively.
Our novel layer can be used as a drop-in replacement for 2D and 3D upsampling operators and the code will be publicly available.
arXiv Detail & Related papers (2022-10-17T21:35:29Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation [19.53151547706724]
transformer-based models have drawn attention to exploring these techniques in medical image segmentation.
We propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.
It has fewer parameters and takes less GPU memory to train than the previous transformer-based models.
arXiv Detail & Related papers (2021-10-20T06:47:28Z) - Atrous Residual Interconnected Encoder to Attention Decoder Framework
for Vertebrae Segmentation via 3D Volumetric CT Images [1.8146155083014204]
This paper proposes a novel algorithm for automated vertebrae segmentation via 3D volumetric spine CT images.
The proposed model is based on the structure of encoder to decoder, using layer normalization to optimize mini-batch training performance.
The experimental results show that our model achieves competitive performance compared with other state-of-the-art medical semantic segmentation methods.
arXiv Detail & Related papers (2021-04-08T12:09:16Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.