WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis
- URL: http://arxiv.org/abs/2402.19043v2
- Date: Fri, 19 Jul 2024 09:42:22 GMT
- Title: WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis
- Authors: Paul Friedrich, Julia Wolleb, Florentin Bieder, Alicia Durrer, Philippe C. Cattin,
- Abstract summary: This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet images.
Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of $128 times 128 times 128$ demonstrate state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores.
Our proposed method is the only one capable of generating high-quality images at a resolution of $256 times 256 times 256$, outperforming all comparing methods.
- Score: 1.647759094903376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the three-dimensional nature of CT- or MR-scans, generative modeling of medical images is a particularly challenging task. Existing approaches mostly apply patch-wise, slice-wise, or cascaded generation techniques to fit the high-dimensional data into the limited GPU memory. However, these approaches may introduce artifacts and potentially restrict the model's applicability for certain downstream tasks. This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images. The presented approach is a simple yet effective way of scaling 3D diffusion models to high resolutions and can be trained on a single \SI{40}{\giga\byte} GPU. Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of $128 \times 128 \times 128$ demonstrate state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores compared to recent GANs, Diffusion Models, and Latent Diffusion Models. Our proposed method is the only one capable of generating high-quality images at a resolution of $256 \times 256 \times 256$, outperforming all comparing methods.
Related papers
- cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis [1.767791678320834]
This paper contributes to the "BraTS 2024 Brain MR Image Synthesis Challenge"
It presents a conditional Wavelet Diffusion Model for solving a paired image-to-image translation task on high-resolution volumes.
arXiv Detail & Related papers (2024-11-26T08:17:57Z) - Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - MAISI: Medical AI for Synthetic Imaging [16.687814167558326]
Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns.
This paper introduces the Medical AI for Synthetic Imaging (MAISI) to generate synthetic 3D computed tomography (CT) images.
arXiv Detail & Related papers (2024-09-13T18:15:08Z) - SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions [5.100085108873068]
We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU.
Our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.
arXiv Detail & Related papers (2024-03-25T11:16:23Z) - Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference [60.32804641276217]
We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
arXiv Detail & Related papers (2023-10-06T17:11:58Z) - Memory-Efficient 3D Denoising Diffusion Models for Medical Image Processing [0.9424565541639366]
We present a number of ways to reduce the resource consumption for 3D diffusion models.
The main contribution of this paper is the memory-efficient patch-based diffusion model.
While the proposed diffusion model can be applied to any image generation tasks, we evaluate the method on the tumor segmentation task of the BraTS 2020 dataset.
arXiv Detail & Related papers (2023-03-27T15:10:19Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z) - Cascaded Diffusion Models for High Fidelity Image Generation [53.57766722279425]
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge.
A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution.
We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation.
arXiv Detail & Related papers (2021-05-30T17:14:52Z) - Hierarchical Amortized Training for Memory-efficient High Resolution 3D
GAN [52.851990439671475]
We propose a novel end-to-end GAN architecture that can generate high-resolution 3D images.
We achieve this goal by using different configurations between training and inference.
Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation.
arXiv Detail & Related papers (2020-08-05T02:33:04Z) - Improved Techniques for Training Score-Based Generative Models [104.20217659157701]
We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces.
We can effortlessly scale score-based generative models to images with unprecedented resolutions.
Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets.
arXiv Detail & Related papers (2020-06-16T09:17:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.