Towards a Taxonomy for the Use of Synthetic Data in Advanced Analytics
- URL: http://arxiv.org/abs/2212.02622v1
- Date: Mon, 5 Dec 2022 22:13:58 GMT
- Title: Towards a Taxonomy for the Use of Synthetic Data in Advanced Analytics
- Authors: Peter Kowalczyk, Giacomo Welsch, Fr\'ed\'eric Thiesse
- Abstract summary: We present a taxonomy highlighting the various facets of deploying synthetic data for advanced analytics systems.
We identify typical application scenarios for synthetic data to assess the current state of adoption.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of deep learning techniques led to a wide range of advanced
analytics applications in important business areas such as predictive
maintenance or product recommendation. However, as the effectiveness of
advanced analytics naturally depends on the availability of sufficient data, an
organization's ability to exploit the benefits might be restricted by limited
data or likewise data access. These challenges could force organizations to
spend substantial amounts of money on data, accept constrained analytics
capacities, or even turn into a showstopper for analytics projects. Against
this backdrop, recent advances in deep learning to generate synthetic data may
help to overcome these barriers. Despite its great potential, however,
synthetic data are rarely employed. Therefore, we present a taxonomy
highlighting the various facets of deploying synthetic data for advanced
analytics systems. Furthermore, we identify typical application scenarios for
synthetic data to assess the current state of adoption and thereby unveil
missed opportunities to pave the way for further research.
Related papers
- Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Benchmarking Data Science Agents [11.582116078653968]
Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing.
Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process.
We introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents.
arXiv Detail & Related papers (2024-02-27T03:03:06Z) - Synthetic Data in AI: Challenges, Applications, and Ethical Implications [16.01404243695338]
This report explores the multifaceted aspects of synthetic data.
It emphasizes the challenges and potential biases these datasets may harbor.
It also critically addresses the ethical considerations and legal implications associated with synthetic datasets.
arXiv Detail & Related papers (2024-01-03T09:03:30Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study
on Telematics Data with ChatGPT [0.0]
This research delves into the construction and utilization of synthetic datasets, specifically within the telematics sphere, leveraging OpenAI's powerful language model, ChatGPT.
To illustrate this data creation process, a hands-on case study is conducted, focusing on the generation of a synthetic telematics dataset.
arXiv Detail & Related papers (2023-06-23T15:15:13Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - Towards Generalizable Data Protection With Transferable Unlearnable
Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples.
To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Synthetic Data: Methods, Use Cases, and Risks [11.413309528464632]
A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead.
We provide a gentle introduction to synthetic data and discuss its use cases, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology.
arXiv Detail & Related papers (2023-03-01T16:35:33Z) - Synthetic Data in Human Analysis: A Survey [16.562921709882865]
Survey is intended for researchers and practitioners in the field of human analysis.
We conduct a survey that summarises current state-of-the-art methods and the main benefits of using synthetic data.
We also provide an overview of publicly available synthetic datasets and generation models.
arXiv Detail & Related papers (2022-08-19T07:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.