On synthetic data generation for anomaly detection in complex social
networks
- URL: http://arxiv.org/abs/2010.13026v1
- Date: Sun, 25 Oct 2020 03:53:19 GMT
- Title: On synthetic data generation for anomaly detection in complex social
networks
- Authors: Andreea Sistrunk, Vanessa Cedeno and Subhodip Biswas
- Abstract summary: This paper studies the feasibility of synthetic data generation for mission-critical applications.
In particular, the development of a generative model, capable of creating data for anomalous rare activities in complex social networks is sought.
- Score: 1.1602089225841632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the feasibility of synthetic data generation for
mission-critical applications. The emphasis is on synthetic data generation for
anomalous detection in complex social networks. In particular, the development
of a heuristic generative model, capable of creating data for anomalous rare
activities in complex social networks is sought. To this end, lessons from
social and political literature are applied to prototype a novel implementation
of the Agent-based Modeling (ABM) framework, based on simple social
interactions between agents, for synthetic data generation in the context of
terrorist profile desegregation. The conclusion offers directions for further
verification, fine-tuning, and proposes future directions of work for the ABM
prototype, as a complex-societal approach to synthetic data generation, by
identifying heuristic hyper-parameter tuning methodologies to further ensure
the generated data distribution is similar to the true distribution of the
original data-sets. While a rigorous mathematical optimization for reducing the
distances in distributions is not offered in this work, we opine that this
prototype of an autonomous-agent generative complex social model is useful for
studying and researching on pattern of life and anomaly detection where there
is strict limitation or lack of sufficient data for using practical machine
learning solutions in mission-critical applications.
Related papers
- Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis [0.0]
This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize Malicious Network Traffic.
Our approach transforms numerical data into text, re-framing data generation as a language modeling task.
Our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data.
arXiv Detail & Related papers (2024-11-04T09:51:10Z) - zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation [0.0]
"Black swans" have posed a challenge to performance of classical machine learning models.
This article provides an overview of the zGAN model architecture developed for the purpose of generating synthetic data with outlier characteristics.
It shows promising results on realistic synthetic data generation, as well as uplift capabilities vis-a-vis model performance.
arXiv Detail & Related papers (2024-10-28T07:55:11Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - A supervised generative optimization approach for tabular data [2.5311562666866494]
This work presents a novel synthetic data generation framework.
It integrates a supervised component tailored to the specific downstream task and employs a meta-learning approach to learn the optimal mixture distribution of existing synthetic distributions.
arXiv Detail & Related papers (2023-09-10T16:56:46Z) - Synthetic Demographic Data Generation for Card Fraud Detection Using
GANs [4.651915393462367]
We build a deep-learning Generative Adversarial Network (GAN), called DGGAN, which will be used for demographic data generation.
Our model generates samples during model training, which we found important to overcame class imbalance issues.
arXiv Detail & Related papers (2023-06-29T17:08:57Z) - Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty
Quantification [3.175239447683357]
This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method.
The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data.
We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.
arXiv Detail & Related papers (2023-05-30T01:01:36Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z) - Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset.
With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset.
In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.