Related papers: On synthetic data generation for anomaly detection in complex social networks

On synthetic data generation for anomaly detection in complex social networks

URL: http://arxiv.org/abs/2010.13026v1
Date: Sun, 25 Oct 2020 03:53:19 GMT
Title: On synthetic data generation for anomaly detection in complex social networks
Authors: Andreea Sistrunk, Vanessa Cedeno and Subhodip Biswas
Abstract summary: This paper studies the feasibility of synthetic data generation for mission-critical applications. In particular, the development of a generative model, capable of creating data for anomalous rare activities in complex social networks is sought.
Score: 1.1602089225841632
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the feasibility of synthetic data generation for mission-critical applications. The emphasis is on synthetic data generation for anomalous detection in complex social networks. In particular, the development of a heuristic generative model, capable of creating data for anomalous rare activities in complex social networks is sought. To this end, lessons from social and political literature are applied to prototype a novel implementation of the Agent-based Modeling (ABM) framework, based on simple social interactions between agents, for synthetic data generation in the context of terrorist profile desegregation. The conclusion offers directions for further verification, fine-tuning, and proposes future directions of work for the ABM prototype, as a complex-societal approach to synthetic data generation, by identifying heuristic hyper-parameter tuning methodologies to further ensure the generated data distribution is similar to the true distribution of the original data-sets. While a rigorous mathematical optimization for reducing the distances in distributions is not offered in this work, we opine that this prototype of an autonomous-agent generative complex social model is useful for studying and researching on pattern of life and anomaly detection where there is strict limitation or lack of sufficient data for using practical machine learning solutions in mission-critical applications.

Related papers

Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis [0.0]
This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize Malicious Network Traffic. Our approach transforms numerical data into text, re-framing data generation as a language modeling task. Our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data.
arXiv Detail & Related papers (2024-11-04T09:51:10Z)
zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation [0.0]
"Black swans" have posed a challenge to performance of classical machine learning models. This article provides an overview of the zGAN model architecture developed for the purpose of generating synthetic data with outlier characteristics. It shows promising results on realistic synthetic data generation, as well as uplift capabilities vis-a-vis model performance.
arXiv Detail & Related papers (2024-10-28T07:55:11Z)
Synthetic location trajectory generation using categorical diffusion models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data. We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z)
Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models. We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
A supervised generative optimization approach for tabular data [2.5311562666866494]
This work presents a novel synthetic data generation framework. It integrates a supervised component tailored to the specific downstream task and employs a meta-learning approach to learn the optimal mixture distribution of existing synthetic distributions.
arXiv Detail & Related papers (2023-09-10T16:56:46Z)
Synthetic Demographic Data Generation for Card Fraud Detection Using GANs [4.651915393462367]
We build a deep-learning Generative Adversarial Network (GAN), called DGGAN, which will be used for demographic data generation. Our model generates samples during model training, which we found important to overcame class imbalance issues.
arXiv Detail & Related papers (2023-06-29T17:08:57Z)
Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification [3.175239447683357]
This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method. The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data. We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.
arXiv Detail & Related papers (2023-05-30T01:01:36Z)
Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task. We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z)
Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z)
Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset. With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset. In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.