Generation of Probabilistic Synthetic Data for Serious Games: A Case
Study on Cyberbullying
- URL: http://arxiv.org/abs/2306.01365v2
- Date: Mon, 3 Jul 2023 09:58:09 GMT
- Title: Generation of Probabilistic Synthetic Data for Serious Games: A Case
Study on Cyberbullying
- Authors: Jaime P\'erez, Mario Castro, Edmond Awad, Gregorio L\'opez
- Abstract summary: We propose a simulator for generating probabilistic synthetic data for serious games based on interactive narratives.
This architecture is designed to be generic and modular so that it can be used by other researchers on similar problems.
We apply the proposed architecture and methods in a use case of a serious game focused on cyberbullying.
- Score: 0.45880283710344055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthetic data generation has been a growing area of research in recent
years. However, its potential applications in serious games have not been
thoroughly explored. Advances in this field could anticipate data modelling and
analysis, as well as speed up the development process. To try to fill this gap
in the literature, we propose a simulator architecture for generating
probabilistic synthetic data for serious games based on interactive narratives.
This architecture is designed to be generic and modular so that it can be used
by other researchers on similar problems. To simulate the interaction of
synthetic players with questions, we use a cognitive testing model based on the
Item Response Theory framework. We also show how probabilistic graphical models
(in particular Bayesian networks) can be used to introduce expert knowledge and
external data into the simulation. Finally, we apply the proposed architecture
and methods in a use case of a serious game focused on cyberbullying. We
perform Bayesian inference experiments using a hierarchical model to
demonstrate the identifiability and robustness of the generated data.
Related papers
- Synthetic data: How could it be used for infectious disease research? [0.16752458252726457]
Concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation.
These include the potential misuse of generative artificial intelligence in fields such as cybercrime.
Synthetic data offers significant benefits, particularly in data privacy, research, in balancing datasets and reducing bias in machine learning models.
arXiv Detail & Related papers (2024-07-03T17:13:04Z) - Best Practices and Lessons Learned on Synthetic Data for Language Models [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs.
This paper explores the innovative concept of harnessing these AI-generated images as new data sources.
In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Synthetic Data in Healthcare [10.555189948915492]
We present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine.
We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.
arXiv Detail & Related papers (2023-04-06T17:23:39Z) - Machine Learning for Synthetic Data Generation: A Review [23.073056971997715]
This paper reviews existing studies that employ machine learning models for the purpose of generating synthetic data.
The review encompasses various perspectives, starting with the applications of synthetic data generation, spanning computer vision, speech, natural language processing, healthcare, and business domains.
The paper also addresses the crucial aspects of privacy and fairness concerns related to synthetic data generation.
arXiv Detail & Related papers (2023-02-08T13:59:31Z) - Evaluation of Categorical Generative Models -- Bridging the Gap Between
Real and Synthetic Data [18.142397311464343]
We introduce an appropriately scalable evaluation method for generative models.
We consider increasingly large probability spaces, which correspond to increasingly difficult modeling tasks.
We validate our evaluation procedure with synthetic experiments on both synthetic generative models and current state-of-the-art categorical generative models.
arXiv Detail & Related papers (2022-10-28T21:05:25Z) - Comparing Synthetic Tabular Data Generation Between a Probabilistic
Model and a Deep Learning Model for Education Use Cases [12.358921226358133]
The ability to generate synthetic data has a variety of use cases across different domains.
In education research, there is a growing need to have access to synthetic data to test certain concepts and ideas.
arXiv Detail & Related papers (2022-10-16T13:21:23Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.