Copula-based transferable models for synthetic population generation
- URL: http://arxiv.org/abs/2302.09193v3
- Date: Thu, 22 Aug 2024 11:55:20 GMT
- Title: Copula-based transferable models for synthetic population generation
- Authors: Pascal Jutras-Dubé, Mohammad B. Al-Khasawneh, Zhichao Yang, Javier Bas, Fabian Bastin, Cinzia Cirillo,
- Abstract summary: Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents.
Traditional methods, often reliant on target population samples, face limitations due to high costs and small sample sizes.
We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known.
- Score: 1.370096215615823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents for behavioral modeling and simulation. Traditional methods, often reliant on target population samples, such as census data or travel surveys, face limitations due to high costs and small sample sizes, particularly at smaller geographical scales. We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known. This method utilizes samples from different populations with similar marginal dependencies, introduces a spatial component into population synthesis, and considers various information sources for more realistic generators. Concretely, the process involves normalizing the data and treating it as realizations of a given copula, and then training a generative model before incorporating the information on the marginals of the target population. Utilizing American Community Survey data, we assess our framework's performance through standardized root mean squared error (SRMSE) and so-called sampled zeros. We focus on its capacity to transfer a model learned from one population to another. Our experiments include transfer tests between regions at the same geographical level as well as to lower geographical levels, hence evaluating the framework's adaptability in varied spatial contexts. We compare Bayesian Networks, Variational Autoencoders, and Generative Adversarial Networks, both individually and combined with our copula framework. Results show that the copula enhances machine learning methods in matching the marginals of the reference data. Furthermore, it consistently surpasses Iterative Proportional Fitting in terms of SRMSE in the transferability experiments, while introducing unique observations not found in the original training sample.
Related papers
- A Deep Generative Framework for Joint Households and Individuals Population Synthesis [0.562479170374811]
We propose a deep generative framework to generate a synthetic population with household-individual and individual-individual relationships.
Results for an application in Delaware, USA demonstrate the ability to ensure the realism of generated household-individual records.
arXiv Detail & Related papers (2024-06-30T23:01:58Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Estimating Unknown Population Sizes Using the Hypergeometric Distribution [1.03590082373586]
We tackle the challenge of estimating discrete distributions when both the total population size and the sizes of its constituent categories are unknown.
We develop our approach to account for a data generating process where the ground-truth is a mixture of distributions conditional on a continuous latent variable.
Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data.
arXiv Detail & Related papers (2024-02-22T01:53:56Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - VFedMH: Vertical Federated Learning for Training Multiple Heterogeneous
Models [53.30484242706966]
This paper proposes a novel approach called Vertical federated learning for training multiple Heterogeneous models (VFedMH)
To protect the participants' local embedding values, we propose an embedding protection method based on lightweight blinding factors.
Experiments are conducted to demonstrate that VFedMH can simultaneously train multiple heterogeneous models with heterogeneous optimization and outperform some recent methods in model performance.
arXiv Detail & Related papers (2023-10-20T09:22:51Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Improving Heterogeneous Model Reuse by Density Estimation [105.97036205113258]
This paper studies multiparty learning, aiming to learn a model using the private data of different participants.
Model reuse is a promising solution for multiparty learning, assuming that a local model has been trained for each party.
arXiv Detail & Related papers (2023-05-23T09:46:54Z) - Heterogeneous Datasets for Federated Survival Analysis Simulation [6.489759672413373]
This work proposes a novel technique for constructing realistic heterogeneous datasets by starting from existing non-federated datasets in a reproducible way.
Specifically, we provide two novel dataset-splitting algorithms based on the Dirichlet distribution to assign each data sample to a carefully chosen client.
The implementation of the proposed methods is publicly available in favor of and to encourage common practices to simulate federated environments for survival analysis.
arXiv Detail & Related papers (2023-01-28T11:37:07Z) - Robustness Analysis of Deep Learning Models for Population Synthesis [5.9106199000537645]
We present bootstrap confidence interval for the deep generative models to evaluate robustness to multiple datasets.
The models are implemented on multiple travel diaries of Montreal Origin- Destination Survey of 2008, 2013, and 2018.
Results show that the predictive errors of CTGAN have narrower confidence intervals indicating its robustness to multiple datasets.
arXiv Detail & Related papers (2022-11-23T22:55:55Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z) - Composite Travel Generative Adversarial Networks for Tabular and
Sequential Population Synthesis [5.259027520298188]
We present a Composite Travel Generative Adversarial Network (CTGAN) to estimate the underlying joint distribution of a population.
The CTGAN model is compared with other recently proposed methods such as the Variational Autoencoders (VAE) method.
arXiv Detail & Related papers (2020-04-15T00:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.