Related papers: Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation

Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation

URL: http://arxiv.org/abs/2410.23962v1
Date: Thu, 31 Oct 2024 14:14:30 GMT
Title: Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation
Authors: Yihang Zhou, Rebecca Towning, Zaid Awad, Stamatia Giannarou,
Abstract summary: We propose the Class-Aware Semantic Diffusion Model (CASDM) to tackle data scarcity and imbalance. Class-aware mean squared error and class-aware self-perceptual loss functions have been defined to prioritize critical, less visible classes. We are the first to generate multi-class segmentation maps using text prompts in a novel fashion to specify their contents.
Score: 3.6723640056915436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Surgical scene segmentation is essential for enhancing surgical precision, yet it is frequently compromised by the scarcity and imbalance of available data. To address these challenges, semantic image synthesis methods based on generative adversarial networks and diffusion models have been developed. However, these models often yield non-diverse images and fail to capture small, critical tissue classes, limiting their effectiveness. In response, we propose the Class-Aware Semantic Diffusion Model (CASDM), a novel approach which utilizes segmentation maps as conditions for image synthesis to tackle data scarcity and imbalance. Novel class-aware mean squared error and class-aware self-perceptual loss functions have been defined to prioritize critical, less visible classes, thereby enhancing image quality and relevance. Furthermore, to our knowledge, we are the first to generate multi-class segmentation maps using text prompts in a novel fashion to specify their contents. These maps are then used by CASDM to generate surgical scene images, enhancing datasets for training and validating segmentation models. Our evaluation, which assesses both image quality and downstream segmentation performance, demonstrates the strong effectiveness and generalisability of CASDM in producing realistic image-map pairs, significantly advancing surgical scene segmentation across diverse and challenging datasets.

Related papers

SkinDualGen: Prompt-Driven Diffusion for Simultaneous Image-Mask Generation in Skin Lesions [0.0]
We propose a novel method that leverages the pretrained Stable Diffusion-2.0 model to generate high-quality synthetic skin lesion images.<n>A hybrid dataset combining real and synthetic data markedly enhances the performance of classification and segmentation models.
arXiv Detail & Related papers (2025-07-26T15:00:37Z)
Iterative Misclassification Error Training (IMET): An Optimized Neural Network Training Technique for Image Classification [0.5115559623386964]
We introduce Iterative Misclassification Error Training (IMET), a novel framework inspired by curriculum learning and coreset selection.<n>IMET aims to identify misclassified samples in order to streamline the training process, while prioritizing the model's attention to edge case senarious and rare outcomes.<n>The paper evaluates IMET's performance on benchmark medical image classification datasets against state-of-the-art ResNet architectures.
arXiv Detail & Related papers (2025-07-01T04:14:16Z)
PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors. Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images. Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z)
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data. This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model [0.10910416614141322]
Diffusion models are the state-of-the-art solution for generating in-silico images. Appearance transfer diffusion models are designed for natural images. In computational pathology, specifically in oncology, it is not straightforward to define which objects in an image should be classified as foreground and background. We contribute to the applicability of appearance transfer models to diffusion-stained images by modifying the appearance transfer guidance to alternate between class-specific AdaIN feature statistics matchings.
arXiv Detail & Related papers (2024-07-16T12:36:26Z)
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images. Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z)
Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL) Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z)
Scaling Laws of Synthetic Images for Model Training ... for Now [54.43596959598466]
We study the scaling laws of synthetic images generated by state of the art text-to-image models. We observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training.
arXiv Detail & Related papers (2023-12-07T18:59:59Z)
Boosting Dermatoscopic Lesion Segmentation via Diffusion Models with Visual and Textual Prompts [27.222844687360823]
We adapt the latest advance in the generative model, with the added control flow using lesion-specific visual and textual prompts. It can achieve a 9% increase in the SSIM image quality measure and an over 5% increase in Dice coefficients over the prior arts.
arXiv Detail & Related papers (2023-10-04T15:43:26Z)
Self-Supervised Correction Learning for Semi-Supervised Biomedical Image Segmentation [84.58210297703714]
We propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation. We design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting. Experiments on three medical image segmentation datasets for different tasks demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2023-01-12T08:19:46Z)
Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. We explicitly account for prior images and reports when available during both training and fine-tuning. Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z)
Data Augmentation using Feature Generation for Volumetric Medical Images [0.08594140167290097]
Medical image classification is one of the most critical problems in the image recognition area. One of the major challenges in this field is the scarcity of labelled training data. Deep Learning models, in particular, show promising results on image segmentation and classification problems.
arXiv Detail & Related papers (2022-09-28T13:46:24Z)
Mixed-UNet: Refined Class Activation Mapping for Weakly-Supervised Semantic Segmentation with Multi-scale Inference [28.409679398886304]
We develop a novel model named Mixed-UNet, which has two parallel branches in the decoding phase. We evaluate the designed Mixed-UNet against several prevalent deep learning-based segmentation approaches on our dataset collected from the local hospital and public datasets.
arXiv Detail & Related papers (2022-05-06T08:37:02Z)
Multi-label Thoracic Disease Image Classification with Cross-Attention Networks [65.37531731899837]
We propose a novel scheme of Cross-Attention Networks (CAN) for automated thoracic disease classification from chest x-ray images. We also design a new loss function that beyond cross-entropy loss to help cross-attention process and is able to overcome the imbalance between classes and easy-dominated samples within each class.
arXiv Detail & Related papers (2020-07-21T14:37:00Z)
Towards Unsupervised Learning for Instrument Segmentation in Robotic Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation. Our approach allows to train image segmentation models without the need to acquire expensive annotations. We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.