Generation of synthetic data using breast cancer dataset and classification with resnet18
- URL: http://arxiv.org/abs/2405.16286v1
- Date: Sat, 25 May 2024 15:53:27 GMT
- Title: Generation of synthetic data using breast cancer dataset and classification with resnet18
- Authors: Dilsat Berin Aytar, Semra Gunduc,
- Abstract summary: Synthetic data is required for a number of reasons, including the constraints of real data, the expense of collecting labeled data, and privacy and security problems.
A deep learning model called GAN (Generative Adversarial Networks) has been developed with the intention of generating synthetic data.
In this study, the Breast Histopathology dataset was used to generate malignant and negatively labeled synthetic patch images.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since technology is advancing so quickly in the modern era of information, data is becoming an essential resource in many fields. Correct data collection, organization, and analysis make it a potent tool for successful decision-making, process improvement, and success across a wide range of sectors. Synthetic data is required for a number of reasons, including the constraints of real data, the expense of collecting labeled data, and privacy and security problems in specific situations and domains. For a variety of reasons, including security, ethics, legal restrictions, sensitivity and privacy issues, and ethics, synthetic data is a valuable tool, particularly in the health sector. A deep learning model called GAN (Generative Adversarial Networks) has been developed with the intention of generating synthetic data. In this study, the Breast Histopathology dataset was used to generate malignant and negatively labeled synthetic patch images using MSG-GAN (Multi-Scale Gradients for Generative Adversarial Networks), a form of GAN, to aid in cancer identification. After that, the ResNet18 model was used to classify both synthetic and real data via Transfer Learning. Following the investigation, an attempt was made to ascertain whether the synthetic images behaved like the real data or if they are comparable to the original data.
Related papers
- Cancer-Net SCa-Synth: An Open Access Synthetically Generated 2D Skin Lesion Dataset for Skin Cancer Classification [65.83291923029985]
In the United States, skin cancer ranks as the most commonly diagnosed cancer, presenting a significant public health issue.
Recent advancements in dataset curation and deep learning have shown promise in quick and accurate detection of skin cancer.
Cancer-Net SCa- Synth is an open access synthetically generated 2D skin lesion dataset for skin cancer classification.
arXiv Detail & Related papers (2024-11-08T02:04:21Z) - A Novel Taxonomy for Navigating and Classifying Synthetic Data in Healthcare Applications [9.66493160220239]
This paper proposes a novel taxonomy of synthetic data in healthcare to navigate the landscape in terms of three main varieties.
Data Proportion comprises different ratios of synthetic data in a dataset and associated pros and cons.
Data Modality refers to the different data formats amenable to synthesis and format-specific challenges.
Data Transformation concerns improving specific aspects of a dataset like its utility or privacy with synthetic data.
arXiv Detail & Related papers (2024-09-01T12:04:03Z) - NFDI4Health workflow and service for synthetic data generation, assessment and risk management [0.0]
A promising solution to this challenge is synthetic data generation.
This technique creates entirely new datasets that mimic the statistical properties of real data.
In this paper, we present the workflow and different services developed in the context of Germany's National Data Infrastructure project NFDI4Health.
arXiv Detail & Related papers (2024-08-08T14:08:39Z) - Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks [5.0243930429558885]
This paper introduces Knowledge Recycling (KR), a pipeline designed to optimise the generation and use of synthetic data for training downstream classifiers.
At the heart of this pipeline is Generative Knowledge Distillation (GKD), the proposed technique that significantly improves the quality and usefulness of the information.
The results show a significant reduction in the performance gap between models trained on real and synthetic data, with models based on synthetic data outperforming those trained on real data in some cases.
arXiv Detail & Related papers (2024-07-22T10:31:07Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2)
The dataset is composed of both real and synthetic videos from seven gesture classes.
We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z) - Enabling Synthetic Data adoption in regulated domains [1.9512796489908306]
The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms.
In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for.
A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties.
arXiv Detail & Related papers (2022-04-13T10:53:54Z) - Measuring Utility and Privacy of Synthetic Genomic Data [3.635321290763711]
We provide the first evaluation of the utility and the privacy protection of five state-of-the-art models for generating synthetic genomic data.
Overall, there is no single approach for generating synthetic genomic data that performs well across the board.
arXiv Detail & Related papers (2021-02-05T17:41:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.