Related papers: Generation of Probabilistic Synthetic Data for Serious Games: A Case Study on Cyberbullying

Generation of Probabilistic Synthetic Data for Serious Games: A Case Study on Cyberbullying

URL: http://arxiv.org/abs/2306.01365v2
Date: Mon, 3 Jul 2023 09:58:09 GMT
Title: Generation of Probabilistic Synthetic Data for Serious Games: A Case Study on Cyberbullying
Authors: Jaime P\'erez, Mario Castro, Edmond Awad, Gregorio L\'opez
Abstract summary: We propose a simulator for generating probabilistic synthetic data for serious games based on interactive narratives. This architecture is designed to be generic and modular so that it can be used by other researchers on similar problems. We apply the proposed architecture and methods in a use case of a serious game focused on cyberbullying.
Score: 0.45880283710344055
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Synthetic data generation has been a growing area of research in recent years. However, its potential applications in serious games have not been thoroughly explored. Advances in this field could anticipate data modelling and analysis, as well as speed up the development process. To try to fill this gap in the literature, we propose a simulator architecture for generating probabilistic synthetic data for serious games based on interactive narratives. This architecture is designed to be generic and modular so that it can be used by other researchers on similar problems. To simulate the interaction of synthetic players with questions, we use a cognitive testing model based on the Item Response Theory framework. We also show how probabilistic graphical models (in particular Bayesian networks) can be used to introduce expert knowledge and external data into the simulation. Finally, we apply the proposed architecture and methods in a use case of a serious game focused on cyberbullying. We perform Bayesian inference experiments using a hierarchical model to demonstrate the identifiability and robustness of the generated data.

Related papers

Harnessing Synthetic Data from Generative AI for Statistical Inference [6.0353292419288485]
This paper reviews the current landscape of synthetic data generation and use from a statistical perspective.<n>We survey major classes of modern generative models, their intended use cases, and the benefits they offer.<n>We examine common pitfalls that arise when synthetic data are treated as surrogates for real observations.
arXiv Detail & Related papers (2026-03-05T17:24:41Z)
An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval [51.10419281315848]
We conduct an empirical study to explore the potential of synthetic data for Text-Based Person Retrieval (TBPR) research. We propose an inter-class image generation pipeline, in which an automatic prompt construction strategy is introduced. We develop an intra-class image augmentation pipeline, in which the generative AI models are applied to further edit the images.
arXiv Detail & Related papers (2025-03-28T06:18:15Z)
Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis [0.0]
This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize Malicious Network Traffic. Our approach transforms numerical data into text, re-framing data generation as a language modeling task. Our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data.
arXiv Detail & Related papers (2024-11-04T09:51:10Z)
zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation [0.0]
"Black swans" have posed a challenge to performance of classical machine learning models. This article provides an overview of the zGAN model architecture developed for the purpose of generating synthetic data with outlier characteristics. It shows promising results on realistic synthetic data generation, as well as uplift capabilities vis-a-vis model performance.
arXiv Detail & Related papers (2024-10-28T07:55:11Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs. This paper explores the innovative concept of harnessing these AI-generated images as new data sources. In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z)
Synthetic Data in Healthcare [10.555189948915492]
We present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine. We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.
arXiv Detail & Related papers (2023-04-06T17:23:39Z)
Evaluation of Categorical Generative Models -- Bridging the Gap Between Real and Synthetic Data [18.142397311464343]
We introduce an appropriately scalable evaluation method for generative models. We consider increasingly large probability spaces, which correspond to increasingly difficult modeling tasks. We validate our evaluation procedure with synthetic experiments on both synthetic generative models and current state-of-the-art categorical generative models.
arXiv Detail & Related papers (2022-10-28T21:05:25Z)
Comparing Synthetic Tabular Data Generation Between a Probabilistic Model and a Deep Learning Model for Education Use Cases [12.358921226358133]
The ability to generate synthetic data has a variety of use cases across different domains. In education research, there is a growing need to have access to synthetic data to test certain concepts and ideas.
arXiv Detail & Related papers (2022-10-16T13:21:23Z)
Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation [66.64736150040093]
Machine learning applications are becoming increasingly pervasive in our society. Risk is that they will systematically spread the bias embedded in data. We propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations.
arXiv Detail & Related papers (2022-09-13T11:18:50Z)
Mixed Effects Neural ODE: A Variational Approximation for Analyzing the Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.