Stable Optimization for Large Vision Model Based Deep Image Prior in
Cone-Beam CT Reconstruction
- URL: http://arxiv.org/abs/2203.12476v2
- Date: Sun, 28 Jan 2024 13:08:26 GMT
- Title: Stable Optimization for Large Vision Model Based Deep Image Prior in
Cone-Beam CT Reconstruction
- Authors: Minghui Wu, Yangdi Xu, Yingying Xu, Guangwei Wu, Qingqing Chen,
Hongxiang Lin
- Abstract summary: Large Vision Model (LVM) has recently demonstrated great potential for medical imaging tasks.
Deep Image Prior (DIP) effectively guides an untrained neural network to generate high-quality CBCT images without any training data.
We propose a stable optimization method for the forward-model-free DIP model for sparse-view CBCT.
- Score: 6.558735319783205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Vision Model (LVM) has recently demonstrated great potential for
medical imaging tasks, potentially enabling image enhancement for sparse-view
Cone-Beam Computed Tomography (CBCT), despite requiring a substantial amount of
data for training. Meanwhile, Deep Image Prior (DIP) effectively guides an
untrained neural network to generate high-quality CBCT images without any
training data. However, the original DIP method relies on a well-defined
forward model and a large-capacity backbone network, which is notoriously
difficult to converge. In this paper, we propose a stable optimization method
for the forward-model-free, LVM-based DIP model for sparse-view CBCT. Our
approach consists of two main characteristics: (1) multi-scale perceptual loss
(MSPL) which measures the similarity of perceptual features between the
reference and output images at multiple resolutions without the need for any
forward model, and (2) a reweighting mechanism that stabilizes the iteration
trajectory of MSPL. One shot optimization is used to simultaneously and stably
reweight MSPL and optimize LVM. We evaluate our approach on two publicly
available datasets: SPARE and Walnut. The results show significant improvements
in both image quality metrics and visualization that demonstrates reduced
streak artifacts. The source code is available upon request.
Related papers
- Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning [49.197385954021456]
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for visualization and subsequent analysis tasks.
To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.
Most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios.
arXiv Detail & Related papers (2024-06-10T02:20:26Z) - DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.
DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Deep Optimal Transport: A Practical Algorithm for Photo-realistic Image Restoration [31.58365182858562]
We propose an image restoration algorithm that can control the perceptual quality and/or the mean square error (MSE) of any pre-trained model.
Given about a dozen images restored by the model, it can significantly improve the perceptual quality and/or the MSE of the model for newly restored images without further training.
arXiv Detail & Related papers (2023-06-04T12:21:53Z) - FastMIM: Expediting Masked Image Modeling Pre-training for Vision [65.47756720190155]
FastMIM is a framework for pre-training vision backbones with low-resolution input images.
It reconstructs Histograms of Oriented Gradients (HOG) feature instead of original RGB values of the input images.
It can achieve 83.8%/84.1% top-1 accuracy on ImageNet-1K with ViT-B/Swin-B as backbones.
arXiv Detail & Related papers (2022-12-13T14:09:32Z) - Deep Learning for Material Decomposition in Photon-Counting CT [0.5801044612920815]
We present a novel deep-learning solution for material decomposition in PCCT, based on an unrolled/unfolded iterative network.
Our approach outperforms a maximum likelihood estimation, a variational method, as well as a fully-learned network.
arXiv Detail & Related papers (2022-08-05T19:05:16Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z) - A Two-step-training Deep Learning Framework for Real-time Computational
Imaging without Physics Priors [0.0]
We propose a two-step-training DL (TST-DL) framework for real-time computational imaging without physics priors.
First, a single fully-connected layer (FCL) is trained to directly learn the model.
Then, this FCL is fixed and fixed with an un-trained U-Net architecture for a second-step training to improve the output image fidelity.
arXiv Detail & Related papers (2020-01-10T15:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.