Synthetic data: How could it be used for infectious disease research?
- URL: http://arxiv.org/abs/2407.06211v1
- Date: Wed, 3 Jul 2024 17:13:04 GMT
- Title: Synthetic data: How could it be used for infectious disease research?
- Authors: Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, NĂºria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman,
- Abstract summary: Concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation.
These include the potential misuse of generative artificial intelligence in fields such as cybercrime.
Synthetic data offers significant benefits, particularly in data privacy, research, in balancing datasets and reducing bias in machine learning models.
- Score: 0.16752458252726457
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the last three to five years, it has become possible to generate machine learning synthetic data for healthcare-related uses. However, concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation. These include the potential misuse of generative artificial intelligence (AI) in fields such as cybercrime, the use of deepfakes and fake news to deceive or manipulate, and displacement of human jobs across various market sectors. Here, we consider both current and future positive advances and possibilities with synthetic datasets. Synthetic data offers significant benefits, particularly in data privacy, research, in balancing datasets and reducing bias in machine learning models. Generative AI is an artificial intelligence genre capable of creating text, images, video or other data using generative models. The recent explosion of interest in GenAI was heralded by the invention and speedy move to use of large language models (LLM). These computational models are able to achieve general-purpose language generation and other natural language processing tasks and are based on transformer architectures, which made an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this is surely the time to consider how synthetic data can be used to advance infectious disease research. In this commentary we aim to create an overview of the current and future position of synthetic data in infectious disease research.
Related papers
- When AI Eats Itself: On the Caveats of Data Pollution in the Era of Generative AI [18.641925577551557]
Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music.
To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution.
Not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimize outcomes.
arXiv Detail & Related papers (2024-05-15T13:50:23Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs.
This paper explores the innovative concept of harnessing these AI-generated images as new data sources.
In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z) - Synthetic Demographic Data Generation for Card Fraud Detection Using
GANs [4.651915393462367]
We build a deep-learning Generative Adversarial Network (GAN), called DGGAN, which will be used for demographic data generation.
Our model generates samples during model training, which we found important to overcame class imbalance issues.
arXiv Detail & Related papers (2023-06-29T17:08:57Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Synthetic Data in Healthcare [10.555189948915492]
We present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine.
We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.
arXiv Detail & Related papers (2023-04-06T17:23:39Z) - Machine Learning for Synthetic Data Generation: A Review [23.073056971997715]
This paper reviews existing studies that employ machine learning models for the purpose of generating synthetic data.
The review encompasses various perspectives, starting with the applications of synthetic data generation, spanning computer vision, speech, natural language processing, healthcare, and business domains.
The paper also addresses the crucial aspects of privacy and fairness concerns related to synthetic data generation.
arXiv Detail & Related papers (2023-02-08T13:59:31Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.