Robustness Analysis of Deep Learning Models for Population Synthesis
- URL: http://arxiv.org/abs/2211.13339v1
- Date: Wed, 23 Nov 2022 22:55:55 GMT
- Title: Robustness Analysis of Deep Learning Models for Population Synthesis
- Authors: Daniel Opoku Mensah and Godwin Badu-Marfo and Bilal Farooq
- Abstract summary: We present bootstrap confidence interval for the deep generative models to evaluate robustness to multiple datasets.
The models are implemented on multiple travel diaries of Montreal Origin- Destination Survey of 2008, 2013, and 2018.
Results show that the predictive errors of CTGAN have narrower confidence intervals indicating its robustness to multiple datasets.
- Score: 5.9106199000537645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have become useful for synthetic data generation,
particularly population synthesis. The models implicitly learn the probability
distribution of a dataset and can draw samples from a distribution. Several
models have been proposed, but their performance is only tested on a single
cross-sectional sample. The implementation of population synthesis on single
datasets is seen as a drawback that needs further studies to explore the
robustness of the models on multiple datasets. While comparing with the real
data can increase trust and interpretability of the models, techniques to
evaluate deep generative models' robustness for population synthesis remain
underexplored. In this study, we present bootstrap confidence interval for the
deep generative models, an approach that computes efficient confidence
intervals for mean errors predictions to evaluate the robustness of the models
to multiple datasets. Specifically, we adopt the tabular-based Composite Travel
Generative Adversarial Network (CTGAN) and Variational Autoencoder (VAE), to
estimate the distribution of the population, by generating agents that have
tabular data using several samples over time from the same study area. The
models are implemented on multiple travel diaries of Montreal Origin-
Destination Survey of 2008, 2013, and 2018 and compare the predictive
performance under varying sample sizes from multiple surveys. Results show that
the predictive errors of CTGAN have narrower confidence intervals indicating
its robustness to multiple datasets of the varying sample sizes when compared
to VAE. Again, the evaluation of model robustness against varying sample size
shows a minimal decrease in model performance with decrease in sample size.
This study directly supports agent-based modelling by enabling finer synthetic
generation of populations in a reliable environment.
Related papers
- Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Private Synthetic Data Meets Ensemble Learning [15.425653946755025]
When machine learning models are trained on synthetic data and then deployed on real data, there is often a performance drop.
We introduce a new ensemble strategy for training downstream models, with the goal of enhancing their performance when used on real data.
arXiv Detail & Related papers (2023-10-15T04:24:42Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Bayesian Additive Main Effects and Multiplicative Interaction Models
using Tensor Regression for Multi-environmental Trials [0.0]
We propose a Bayesian tensor regression model to accommodate the effect of multiple factors on phenotype prediction.
We adopt a set of prior distributions that resolve identifiability issues that may arise between the parameters in the model.
We explore the applicability of our model by analysing real-world data related to wheat production across Ireland from 2010 to 2019.
arXiv Detail & Related papers (2023-01-09T19:54:50Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Predicting Multidimensional Data via Tensor Learning [0.0]
We develop a model that retains the intrinsic multidimensional structure of the dataset.
To estimate the model parameters, an Alternating Least Squares algorithm is developed.
The proposed model is able to outperform benchmark models present in the forecasting literature.
arXiv Detail & Related papers (2020-02-11T11:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.