Related papers: Improving text-conditioned latent diffusion for cancer pathology

Improving text-conditioned latent diffusion for cancer pathology

URL: http://arxiv.org/abs/2412.06487v1
Date: Mon, 09 Dec 2024 13:38:19 GMT
Title: Improving text-conditioned latent diffusion for cancer pathology
Authors: Aakash Madhav Rao, Debayan Gupta,
Abstract summary: generative models have allowed for hyperrealistic data synthesis.<n>One algorithm for synthesising a realistic image is diffusion; it iteratively converts an image to noise and learns the recovery process from this noise.<n>VAEs have allowed us to learn the representation of complex high-resolution images in a latent space.<n>The marriage of diffusion and VAEs allows us to carry out diffusion in the latent space of an autoencoder, enabling us to leverage the realistic generative capabilities of diffusion.
Score: 0.5919433278490629
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The development of generative models in the past decade has allowed for hyperrealistic data synthesis. While potentially beneficial, this synthetic data generation process has been relatively underexplored in cancer histopathology. One algorithm for synthesising a realistic image is diffusion; it iteratively converts an image to noise and learns the recovery process from this noise [Wang and Vastola, 2023]. While effective, it is highly computationally expensive for high-resolution images, rendering it infeasible for histopathology. The development of Variational Autoencoders (VAEs) has allowed us to learn the representation of complex high-resolution images in a latent space. A vital by-product of this is the ability to compress high-resolution images to space and recover them lossless. The marriage of diffusion and VAEs allows us to carry out diffusion in the latent space of an autoencoder, enabling us to leverage the realistic generative capabilities of diffusion while maintaining reasonable computational requirements. Rombach et al. [2021b] and Yellapragada et al. [2023] build foundational models for this task, paving the way to generate realistic histopathology images. In this paper, we discuss the pitfalls of current methods, namely [Yellapragada et al., 2023] and resolve critical errors while proposing improvements along the way. Our methods achieve an FID score of 21.11, beating its SOTA counterparts in [Yellapragada et al., 2023] by 1.2 FID, while presenting a train-time GPU memory usage reduction of 7%.

Related papers

PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution [60.970656010712275]
We propose a combination of diffusion-based generation (PanoDiff) and Super-Resolution (SR) for generating synthetic dental panoramic radiographs (PRs)<n>The former generates a low-resolution (LR) seed of a PR which is then processed by the SR model to yield a high-resolution (HR) PR of size 1024 X 512.<n>For SR, we propose a state-of-the-art transformer that learns local-global relationships, resulting in sharper edges and textures.
arXiv Detail & Related papers (2025-07-12T09:52:10Z)
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI [58.35348718345307]
Current efforts to distinguish between real and AI-generated images may lack generalization. We propose a novel framework, Co-Spy, that first enhances existing semantic features. We also create Co-Spy-Bench, a comprehensive dataset comprising 5 real image datasets and 22 state-of-the-art generative models.
arXiv Detail & Related papers (2025-03-24T01:59:29Z)
Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos. We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces. We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z)
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation. We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z)
Time Step Generating: A Universal Synthesized Deepfake Image Detector [0.4488895231267077]
We propose a universal synthetic image detector Time Step Generating (TSG) TSG does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms. We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.
arXiv Detail & Related papers (2024-11-17T09:39:50Z)
One-step Generative Diffusion for Realistic Extreme Image Rescaling [47.89362819768323]
We propose a novel framework called One-Step Image Rescaling Diffusion (OSIRDiff) for extreme image rescaling. OSIRDiff performs rescaling operations in the latent space of a pre-trained autoencoder. It effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-08-17T09:51:42Z)
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples. It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z)
Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models [48.87160158792048]
We introduce a cascaded amortized latent diffusion model (CA-LDM) that can synthesis high-resolution OCT volumes in a memory-efficient way. Experiments on a public high-resolution OCT dataset show that our synthetic data have realistic high-resolution and global features, surpassing the capabilities of existing methods.
arXiv Detail & Related papers (2024-05-26T10:58:22Z)
Paired Diffusion: Generation of related, synthetic PET-CT-Segmentation scans using Linked Denoising Diffusion Probabilistic Models [0.0]
This research introduces a novel architecture that is able to generate multiple, related PET-CT-tumour mask pairs using paired networks and conditional encoders. Our approach includes innovative, time step-controlled mechanisms and a noise-seeding' strategy to improve DDPM sampling consistency.
arXiv Detail & Related papers (2024-03-26T14:21:49Z)
Latent Diffusion Models with Image-Derived Annotations for Enhanced AI-Assisted Cancer Diagnosis in Histopathology [0.0]
This work proposes a method that constructs structured textual prompts from automatically extracted image features. We show that including image-derived features in the prompt, as opposed to only healthy and cancerous labels, improves the Fr'echet Inception Distance (FID) from 178.8 to 90.2. We also show that pathologists find it challenging to detect synthetic images, with a median sensitivity/specificity of 0.55/0.55.
arXiv Detail & Related papers (2023-12-15T13:48:55Z)
Augmenting medical image classifiers with synthetic data from latent diffusion models [12.077733447347592]
We show that latent diffusion models can scalably generate images of skin disease. We generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies.
arXiv Detail & Related papers (2023-08-23T22:34:49Z)
DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration [66.01846902242355]
Blind face restoration usually synthesizes degraded low-quality data with a pre-defined degradation model for training. It is expensive and infeasible to include every type of degradation to cover real-world cases in the training data. We propose Robust Degradation Remover (DR2) to first transform the degraded image to a coarse but degradation-invariant prediction, then employ an enhancement module to restore the coarse prediction to a high-quality image.
arXiv Detail & Related papers (2023-03-13T06:05:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.