LIT-Former: Linking In-plane and Through-plane Transformers for
Simultaneous CT Image Denoising and Deblurring
- URL: http://arxiv.org/abs/2302.10630v2
- Date: Sun, 7 Jan 2024 15:18:37 GMT
- Title: LIT-Former: Linking In-plane and Through-plane Transformers for
Simultaneous CT Image Denoising and Deblurring
- Authors: Zhihao Chen, Chuang Niu, Qi Gao, Ge Wang, Hongming Shan
- Abstract summary: This paper studies 3D low-dose computed tomography (CT) imaging.
Although various deep learning methods were developed in this context, typically they focus on 2D images and perform denoising due to low-dose and deblurring for super-resolution separately.
Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to obtain high-quality 3D CT images with lower radiation and faster imaging speed.
Here, we propose to link in-plane and through-plane transformers for simultaneous in-plane denoising and through-plane
- Score: 22.605286969419485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies 3D low-dose computed tomography (CT) imaging. Although
various deep learning methods were developed in this context, typically they
focus on 2D images and perform denoising due to low-dose and deblurring for
super-resolution separately. Up to date, little work was done for simultaneous
in-plane denoising and through-plane deblurring, which is important to obtain
high-quality 3D CT images with lower radiation and faster imaging speed. For
this task, a straightforward method is to directly train an end-to-end 3D
network. However, it demands much more training data and expensive
computational costs. Here, we propose to link in-plane and through-plane
transformers for simultaneous in-plane denoising and through-plane deblurring,
termed as LIT-Former, which can efficiently synergize in-plane and
through-plane sub-tasks for 3D CT imaging and enjoy the advantages of both
convolution and transformer networks. LIT-Former has two novel designs:
efficient multi-head self-attention modules (eMSM) and efficient convolutional
feedforward networks (eCFN). First, eMSM integrates in-plane 2D self-attention
and through-plane 1D self-attention to efficiently capture global interactions
of 3D self-attention, the core unit of transformer networks. Second, eCFN
integrates 2D convolution and 1D convolution to extract local information of 3D
convolution in the same fashion. As a result, the proposed LIT-Former synergize
these two subtasks, significantly reducing the computational complexity as
compared to 3D counterparts and enabling rapid convergence. Extensive
experimental results on simulated and clinical datasets demonstrate superior
performance over state-of-the-art models. The source code is made available at
https://github.com/hao1635/LIT-Former.
Related papers
- E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D
Medical Image Segmentation [36.367368163120794]
We propose a 3D medical image segmentation model, named Efficient to Efficient Network (E2ENet)
It incorporates two parametrically and computationally efficient designs.
It consistently achieves a superior trade-off between accuracy and efficiency across various resource constraints.
arXiv Detail & Related papers (2023-12-07T22:13:37Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - Spatiotemporal Modeling Encounters 3D Medical Image Analysis:
Slice-Shift UNet with Multi-View Fusion [0.0]
We propose a new 2D-based model dubbed Slice SHift UNet which encodes three-dimensional features at 2D CNN's complexity.
More precisely multi-view features are collaboratively learned by performing 2D convolutions along the three planes of a volume.
The effectiveness of our approach is validated in Multi-Modality Abdominal Multi-Organ axis (AMOS) and Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasets.
arXiv Detail & Related papers (2023-07-24T14:53:23Z) - NeRF-GAN Distillation for Efficient 3D-Aware Generation with
Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images.
We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z) - Multi-scale Transformer Network with Edge-aware Pre-training for
Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones.
Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model.
We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Memory-efficient Segmentation of High-resolution Volumetric MicroCT
Images [11.723370840090453]
We propose a memory-efficient network architecture for 3D high-resolution image segmentation.
The network incorporates both global and local features via a two-stage U-net-based cascaded framework.
Experiments show that it outperforms state-of-the-art 3D segmentation methods in terms of both segmentation accuracy and memory efficiency.
arXiv Detail & Related papers (2022-05-31T16:42:48Z) - RD-Optimized Trit-Plane Coding of Deep Compressed Image Latent Tensors [40.86513649546442]
DPICT is the first learning-based image supporting fine granular scalability.
In this paper, we describe how to implement two key components of DPICT efficiently: trit-plane slicing and RD-prioritized transmission.
arXiv Detail & Related papers (2022-03-25T06:33:16Z) - Weakly Supervised Volumetric Image Segmentation with Deformed Templates [80.04326168716493]
We propose an approach that is truly weakly-supervised in the sense that we only need to provide a sparse set of 3D point on the surface of target objects.
We will show that it outperforms a more traditional approach to weak-supervision in 3D at a reduced supervision cost.
arXiv Detail & Related papers (2021-06-07T22:09:34Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.