Cancer-Net SCa-Synth: An Open Access Synthetically Generated 2D Skin Lesion Dataset for Skin Cancer Classification
- URL: http://arxiv.org/abs/2411.05269v1
- Date: Fri, 08 Nov 2024 02:04:21 GMT
- Title: Cancer-Net SCa-Synth: An Open Access Synthetically Generated 2D Skin Lesion Dataset for Skin Cancer Classification
- Authors: Chi-en Amy Tai, Oustan Ding, Alexander Wong,
- Abstract summary: In the United States, skin cancer ranks as the most commonly diagnosed cancer, presenting a significant public health issue.
Recent advancements in dataset curation and deep learning have shown promise in quick and accurate detection of skin cancer.
Cancer-Net SCa- Synth is an open access synthetically generated 2D skin lesion dataset for skin cancer classification.
- Score: 65.83291923029985
- License:
- Abstract: In the United States, skin cancer ranks as the most commonly diagnosed cancer, presenting a significant public health issue due to its high rates of occurrence and the risk of serious complications if not caught early. Recent advancements in dataset curation and deep learning have shown promise in quick and accurate detection of skin cancer. However, current open-source datasets have significant class imbalances which impedes the effectiveness of these deep learning models. In healthcare, generative artificial intelligence (AI) models have been employed to create synthetic data, addressing data imbalance in datasets by augmenting underrepresented classes and enhancing the overall quality and performance of machine learning models. In this paper, we build on top of previous work by leveraging new advancements in generative AI, notably Stable Diffusion and DreamBooth. We introduce Cancer-Net SCa-Synth, an open access synthetically generated 2D skin lesion dataset for skin cancer classification. Further analysis on the data effectiveness by comparing the ISIC 2020 test set performance for training with and without these synthetic images for a simple model highlights the benefits of leveraging synthetic data to improve performance. Cancer-Net SCa-Synth is publicly available at https://github.com/catai9/Cancer-Net-SCa-Synth as part of a global open-source initiative for accelerating machine learning for cancer care.
Related papers
- Little Giants: Synthesizing High-Quality Embedding Data at Scale [71.352883755806]
We introduce SPEED, a framework that aligns open-source small models to efficiently generate large-scale embedding data.
SPEED uses only less than 1/10 of the GPT API calls, outperforming the state-of-the-art embedding model E5_mistral when both are trained solely on their synthetic data.
arXiv Detail & Related papers (2024-10-24T10:47:30Z) - Generation of synthetic data using breast cancer dataset and classification with resnet18 [0.0]
Synthetic data is required for a number of reasons, including the constraints of real data, the expense of collecting labeled data, and privacy and security problems.
A deep learning model called GAN (Generative Adversarial Networks) has been developed with the intention of generating synthetic data.
In this study, the Breast Histopathology dataset was used to generate malignant and negatively labeled synthetic patch images.
arXiv Detail & Related papers (2024-05-25T15:53:27Z) - An Interpretable Deep Learning Approach for Skin Cancer Categorization [0.0]
We use modern deep learning methods and explainable artificial intelligence (XAI) approaches to address the problem of skin cancer detection.
To categorize skin lesions, we employ four cutting-edge pre-trained models: XceptionNet, EfficientNetV2S, InceptionResNetV2, and EfficientNetV2M.
Our study shows how deep learning and explainable artificial intelligence (XAI) can improve skin cancer diagnosis.
arXiv Detail & Related papers (2023-12-17T12:11:38Z) - Cancer-Net PCa-Gen: Synthesis of Realistic Prostate Diffusion Weighted
Imaging Data via Anatomic-Conditional Controlled Latent Diffusion [68.45407109385306]
In Canada, prostate cancer is the most common form of cancer in men and accounted for 20% of new cancer cases for this demographic in 2022.
There has been significant interest in the development of deep neural networks for prostate cancer diagnosis, prognosis, and treatment planning using diffusion weighted imaging (DWI) data.
In this study, we explore the efficacy of latent diffusion for generating realistic prostate DWI data through the introduction of an anatomic-conditional controlled latent diffusion strategy.
arXiv Detail & Related papers (2023-11-30T15:11:03Z) - Double-Condensing Attention Condenser: Leveraging Attention in Deep Learning to Detect Skin Cancer from Skin Lesion Images [61.36288157482697]
Skin cancer is the most common type of cancer in the United States and is estimated to affect one in five Americans.
Recent advances have demonstrated strong performance on skin cancer detection, as exemplified by state of the art performance in the SIIM-ISIC Melanoma Classification Challenge.
This paper explores leveraging an efficient self-attention structure to detect skin cancer in skin lesion images and introduces a deep neural network design with DC-AC customized for skin cancer detection from skin lesion images.
arXiv Detail & Related papers (2023-11-20T10:45:39Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Machine Learning Against Cancer: Accurate Diagnosis of Cancer by Machine
Learning Classification of the Whole Genome Sequencing Data [0.0]
We have developed novel methods of MLAC (Machine Learning Against Cancer) achieving perfect results with perfect precision, sensitivity, and specificity.
We have used the whole genome sequencing data acquired by next-generation RNA sequencing techniques in The Cancer Genome Atlas and Genotype-Tissue Expression projects for cancerous and healthy tissues respectively.
arXiv Detail & Related papers (2020-09-12T18:51:47Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - CorGAN: Correlation-Capturing Convolutional Generative Adversarial
Networks for Generating Synthetic Healthcare Records [0.0]
We propose a framework called correlation-capturing Generative Adversarial Network (CorGAN) to generate synthetic healthcare records.
To demonstrate the model fidelity, we show that CorGAN generates synthetic data with performance similar to that of real data in various Machine Learning settings.
arXiv Detail & Related papers (2020-01-25T18:43:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.