Multiscale Vector-Quantized Variational Autoencoder for Endoscopic Image Synthesis
- URL: http://arxiv.org/abs/2511.19578v1
- Date: Mon, 24 Nov 2025 18:23:28 GMT
- Title: Multiscale Vector-Quantized Variational Autoencoder for Endoscopic Image Synthesis
- Authors: Dimitrios E. Diamantis, Dimitris K. Iakovidis,
- Abstract summary: This work introduces a novel VAE-based methodology for medical image synthesis and presents its application for the generation of WCE images.<n>The utility of the generated images for Clinical Decision Support is assessed via image classification.<n>The generality of the proposed methodology promises its applicability to various domains related to medical multimedia.
- Score: 4.318555434063273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gastrointestinal (GI) imaging via Wireless Capsule Endoscopy (WCE) generates a large number of images requiring manual screening. Deep learning-based Clinical Decision Support (CDS) systems can assist screening, yet their performance relies on the existence of large, diverse, training medical datasets. However, the scarcity of such data, due to privacy constraints and annotation costs, hinders CDS development. Generative machine learning offers a viable solution to combat this limitation. While current Synthetic Data Generation (SDG) methods, such as Generative Adversarial Networks and Variational Autoencoders have been explored, they often face challenges with training stability and capturing sufficient visual diversity, especially when synthesizing abnormal findings. This work introduces a novel VAE-based methodology for medical image synthesis and presents its application for the generation of WCE images. The novel contributions of this work include a) multiscale extension of the Vector Quantized VAE model, named as Multiscale Vector Quantized Variational Autoencoder (MSVQ-VAE); b) unlike other VAE-based SDG models for WCE image generation, MSVQ-VAE is used to seamlessly introduce abnormalities into normal WCE images; c) it enables conditional generation of synthetic images, enabling the introduction of different types of abnormalities into the normal WCE images; d) it performs experiments with a variety of abnormality types, including polyps, vascular and inflammatory conditions. The utility of the generated images for CDS is assessed via image classification. Comparative experiments demonstrate that training a CDS classifier using the abnormal images generated by the proposed methodology yield comparable results with a classifier trained with only real data. The generality of the proposed methodology promises its applicability to various domains related to medical multimedia.
Related papers
- Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Semantic Map Guided Synthesis of Wireless Capsule Endoscopy Images using
Diffusion Models [4.187344935012482]
Wireless capsule endoscopy (WCE) is a non-invasive method for visualizing the gastrointestinal (GI) tract.
We propose a novel approach leveraging generative models, specifically the diffusion model (DM), for generating diverse WCE images.
Our model incorporates semantic map resulted from visualization scale (VS) engine, enhancing the controllability and diversity of generated images.
arXiv Detail & Related papers (2023-11-10T06:16:44Z) - Adaptive Input-image Normalization for Solving the Mode Collapse Problem in GAN-based X-ray Images [0.08192907805418582]
This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization with the Deep Conversaal GAN and Auxiliary GAN to alleviate the mode collapse problems.
Results demonstrate that the DCGAN and the ACGAN with adaptive input-image normalization outperform the DCGAN and ACGAN with un-normalized X-ray images.
arXiv Detail & Related papers (2023-09-21T16:43:29Z) - CoNeS: Conditional neural fields with shift modulation for multi-sequence MRI translation [5.662694302758443]
Multi-sequence magnetic resonance imaging (MRI) has found wide applications in both modern clinical studies and deep learning research.
It frequently occurs that one or more of the MRI sequences are missing due to different image acquisition protocols or contrast agent contraindications of patients.
One promising approach is to leverage generative models to synthesize the missing sequences, which can serve as a surrogate acquisition.
arXiv Detail & Related papers (2023-09-06T19:01:58Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - This Intestine Does Not Exist: Multiscale Residual Variational
Autoencoder for Realistic Wireless Capsule Endoscopy Image Generation [7.430724826764835]
A novel Variational Autoencoder architecture is proposed, namely "This Intestine Does not Exist" (TIDE)
The proposed architecture comprises multiscale feature extraction convolutional blocks and residual connections, which enable the generation of high-quality and diverse datasets.
Contrary to the current approaches, which are oriented towards the augmentation of the available datasets, this study demonstrates that using TIDE, real WCE datasets can be fully substituted.
arXiv Detail & Related papers (2023-02-04T11:49:38Z) - OADAT: Experimental and Synthetic Clinical Optoacoustic Data for
Standardized Image Processing [62.993663757843464]
Optoacoustic (OA) imaging is based on excitation of biological tissues with nanosecond-duration laser pulses followed by detection of ultrasound waves generated via light-absorption-mediated thermoelastic expansion.
OA imaging features a powerful combination between rich optical contrast and high resolution in deep tissues.
No standardized datasets generated with different types of experimental set-up and associated processing methods are available to facilitate advances in broader applications of OA in clinical settings.
arXiv Detail & Related papers (2022-06-17T08:11:26Z) - Self-supervised Pseudo Multi-class Pre-training for Unsupervised Anomaly
Detection and Segmentation in Medical Images [31.676609117780114]
Unsupervised anomaly detection (UAD) methods are trained with normal (or healthy) images only, but during testing, they are able to classify normal and abnormal images.
We propose a new self-supervised pre-training method for MIA UAD applications, named Pseudo Multi-class Strong Augmentation via Contrastive Learning (PMSACL)
arXiv Detail & Related papers (2021-09-03T04:25:57Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z) - Pathological Retinal Region Segmentation From OCT Images Using Geometric
Relation Based Augmentation [84.7571086566595]
We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape.
The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures.
arXiv Detail & Related papers (2020-03-31T11:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.