An expert-driven data generation pipeline for histological images
- URL: http://arxiv.org/abs/2406.01403v1
- Date: Mon, 3 Jun 2024 15:05:08 GMT
- Title: An expert-driven data generation pipeline for histological images
- Authors: Roberto Basla, Loris Giulivi, Luca Magri, Giacomo Boracchi,
- Abstract summary: We propose a novel pipeline for generating synthetic datasets for cell segmentation.
Given only a handful of annotated images, our method generates a large dataset which can be used to train DL instance segmentation models.
- Score: 7.219732640188684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning (DL) models have been successfully applied to many applications including biomedical cell segmentation and classification in histological images. These models require large amounts of annotated data which might not always be available, especially in the medical field where annotations are scarce and expensive. To overcome this limitation, we propose a novel pipeline for generating synthetic datasets for cell segmentation. Given only a handful of annotated images, our method generates a large dataset of images which can be used to effectively train DL instance segmentation models. Our solution is designed to generate cells of realistic shapes and placement by allowing experts to incorporate domain knowledge during the generation of the dataset.
Related papers
- DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models [48.347550000332866]
DRAGON is a comprehensive dataset comprising images from 25 diffusion models.<n>The dataset contains a broad variety of images representing diverse subjects.<n>DRAGON is designed to support the forensic community in developing and evaluating detection and attribution techniques for synthetic content.
arXiv Detail & Related papers (2025-05-16T13:50:34Z) - PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors.
Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images.
Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z) - Using Synthetic Images to Augment Small Medical Image Datasets [3.7522420000453]
We have developed a novel conditional variant of a current GAN method, the StyleGAN2, to generate high-resolution medical images.
We use the synthetic and real images from six datasets to train models for the downstream task of semantic segmentation.
The quality of the generated medical images and the effect of this augmentation on the segmentation performance were evaluated afterward.
arXiv Detail & Related papers (2025-03-02T17:02:11Z) - MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data.
This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes [35.151834585823224]
We introduce a generative deep learning framework, which uniquely generates high-quality paired segmentation masks and medical images.
Unlike traditional generative models that treat data generation and segmentation model training as separate processes, our method employs multi-level optimization for end-to-end data generation.
Our method demonstrated strong generalization performance across 9 diverse medical image segmentation tasks and on 16 datasets, in ultra-low data regimes.
arXiv Detail & Related papers (2024-08-30T17:11:36Z) - Could We Generate Cytology Images from Histopathology Images? An Empirical Study [1.791005104399795]
In this study, we have explored traditional image-to-image transfer models like CycleGAN, and Neural Style Transfer.
In this study, we have explored traditional image-to-image transfer models like CycleGAN, and Neural Style Transfer.
arXiv Detail & Related papers (2024-03-16T10:43:12Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Analysing the effectiveness of a generative model for semi-supervised
medical image segmentation [23.898954721893855]
State-of-the-art in automated segmentation remains supervised learning, employing discriminative models such as U-Net.
Semi-supervised learning (SSL) attempts to leverage the abundance of unlabelled data to obtain more robust and reliable models.
Deep generative models such as the SemanticGAN are truly viable alternatives to tackle challenging medical image segmentation problems.
arXiv Detail & Related papers (2022-11-03T15:19:59Z) - Histopathology DatasetGAN: Synthesizing Large-Resolution Histopathology
Datasets [0.0]
Histopathology datasetGAN (HDGAN) is a framework for image generation and segmentation that scales well to large-resolution histopathology images.
We make several adaptations from the original framework, including updating the generative backbone, selectively extracting latent features from the generator, and switching to memory-mapped arrays.
We evaluate HDGAN on a thrombotic microangiopathy high-resolution tile dataset, demonstrating strong performance on the high-resolution image-annotation generation task.
arXiv Detail & Related papers (2022-07-06T14:33:50Z) - Lizard: A Large-Scale Dataset for Colonic Nuclear Instance Segmentation
and Classification [4.642724910208435]
We propose a multi-stage annotation pipeline to enable the collection of large-scale datasets for histology image analysis.
We generate the largest known nuclear instance segmentation and classification dataset, containing nearly half a million labelled nuclei.
arXiv Detail & Related papers (2021-08-25T11:58:52Z) - DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [117.41383937100751]
Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets.
We show how the GAN latent code can be decoded to produce a semantic segmentation of the image.
These generated datasets can then be used for training any computer vision architecture just as real datasets are.
arXiv Detail & Related papers (2021-04-13T20:08:29Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.