Generation of Probabilistic Synthetic Data for Serious Games: A Case
Study on Cyberbullying
- URL: http://arxiv.org/abs/2306.01365v2
- Date: Mon, 3 Jul 2023 09:58:09 GMT
- Title: Generation of Probabilistic Synthetic Data for Serious Games: A Case
Study on Cyberbullying
- Authors: Jaime P\'erez, Mario Castro, Edmond Awad, Gregorio L\'opez
- Abstract summary: We propose a simulator for generating probabilistic synthetic data for serious games based on interactive narratives.
This architecture is designed to be generic and modular so that it can be used by other researchers on similar problems.
We apply the proposed architecture and methods in a use case of a serious game focused on cyberbullying.
- Score: 0.45880283710344055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthetic data generation has been a growing area of research in recent
years. However, its potential applications in serious games have not been
thoroughly explored. Advances in this field could anticipate data modelling and
analysis, as well as speed up the development process. To try to fill this gap
in the literature, we propose a simulator architecture for generating
probabilistic synthetic data for serious games based on interactive narratives.
This architecture is designed to be generic and modular so that it can be used
by other researchers on similar problems. To simulate the interaction of
synthetic players with questions, we use a cognitive testing model based on the
Item Response Theory framework. We also show how probabilistic graphical models
(in particular Bayesian networks) can be used to introduce expert knowledge and
external data into the simulation. Finally, we apply the proposed architecture
and methods in a use case of a serious game focused on cyberbullying. We
perform Bayesian inference experiments using a hierarchical model to
demonstrate the identifiability and robustness of the generated data.
Related papers
- Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis [0.0]
This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize Malicious Network Traffic.
Our approach transforms numerical data into text, re-framing data generation as a language modeling task.
Our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data.
arXiv Detail & Related papers (2024-11-04T09:51:10Z) - zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation [0.0]
"Black swans" have posed a challenge to performance of classical machine learning models.
This article provides an overview of the zGAN model architecture developed for the purpose of generating synthetic data with outlier characteristics.
It shows promising results on realistic synthetic data generation, as well as uplift capabilities vis-a-vis model performance.
arXiv Detail & Related papers (2024-10-28T07:55:11Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs.
This paper explores the innovative concept of harnessing these AI-generated images as new data sources.
In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Synthetic Data in Healthcare [10.555189948915492]
We present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine.
We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.
arXiv Detail & Related papers (2023-04-06T17:23:39Z) - Evaluation of Categorical Generative Models -- Bridging the Gap Between
Real and Synthetic Data [18.142397311464343]
We introduce an appropriately scalable evaluation method for generative models.
We consider increasingly large probability spaces, which correspond to increasingly difficult modeling tasks.
We validate our evaluation procedure with synthetic experiments on both synthetic generative models and current state-of-the-art categorical generative models.
arXiv Detail & Related papers (2022-10-28T21:05:25Z) - Comparing Synthetic Tabular Data Generation Between a Probabilistic
Model and a Deep Learning Model for Education Use Cases [12.358921226358133]
The ability to generate synthetic data has a variety of use cases across different domains.
In education research, there is a growing need to have access to synthetic data to test certain concepts and ideas.
arXiv Detail & Related papers (2022-10-16T13:21:23Z) - Investigating Bias with a Synthetic Data Generator: Empirical Evidence
and Philosophical Interpretation [66.64736150040093]
Machine learning applications are becoming increasingly pervasive in our society.
Risk is that they will systematically spread the bias embedded in data.
We propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations.
arXiv Detail & Related papers (2022-09-13T11:18:50Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.