A Deep Generative Framework for Joint Households and Individuals Population Synthesis
- URL: http://arxiv.org/abs/2407.01643v1
- Date: Sun, 30 Jun 2024 23:01:58 GMT
- Title: A Deep Generative Framework for Joint Households and Individuals Population Synthesis
- Authors: Xiao Qian, Utkarsh Gangwal, Shangjia Dong, Rachel Davidson,
- Abstract summary: We propose a deep generative framework to generate a synthetic population with household-individual and individual-individual relationships.
Results for an application in Delaware, USA demonstrate the ability to ensure the realism of generated household-individual records.
- Score: 0.562479170374811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Household and individual-level sociodemographic data are essential for understanding human-infrastructure interaction and policymaking. However, the Public Use Microdata Sample (PUMS) offers only a sample at the state level, while census tract data only provides the marginal distributions of variables without correlations. Therefore, we need an accurate synthetic population dataset that maintains consistent variable correlations observed in microdata, preserves household-individual and individual-individual relationships, adheres to state-level statistics, and accurately represents the geographic distribution of the population. We propose a deep generative framework leveraging the variational autoencoder (VAE) to generate a synthetic population with the aforementioned features. The methodological contributions include (1) a new data structure for capturing household-individual and individual-individual relationships, (2) a transfer learning process with pre-training and fine-tuning steps to generate households and individuals whose aggregated distributions align with the census tract marginal distribution, and (3) decoupled binary cross-entropy (D-BCE) loss function enabling distribution shift and out-of-sample records generation. Model results for an application in Delaware, USA demonstrate the ability to ensure the realism of generated household-individual records and accurately describe population statistics at the census tract level compared to existing methods. Furthermore, testing in North Carolina, USA yielded promising results, supporting the transferability of our method.
Related papers
- Idiographic Personality Gaussian Process for Psychological Assessment [7.394943089551214]
We develop a novel measurement framework based on a Gaussian process coregionalization model to address a long-lasting debate ins.
We propose the idiographic personality Gaussian process (IPGP) framework, an intermediate model that accommodates both shared trait structure across a population and "idiographic" deviations for individuals.
arXiv Detail & Related papers (2024-07-06T06:09:04Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Copula-based transferable models for synthetic population generation [1.370096215615823]
Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents.
Traditional methods, often reliant on target population samples, face limitations due to high costs and small sample sizes.
We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known.
arXiv Detail & Related papers (2023-02-17T23:58:14Z) - Heterogeneous Datasets for Federated Survival Analysis Simulation [6.489759672413373]
This work proposes a novel technique for constructing realistic heterogeneous datasets by starting from existing non-federated datasets in a reproducible way.
Specifically, we provide two novel dataset-splitting algorithms based on the Dirichlet distribution to assign each data sample to a carefully chosen client.
The implementation of the proposed methods is publicly available in favor of and to encourage common practices to simulate federated environments for survival analysis.
arXiv Detail & Related papers (2023-01-28T11:37:07Z) - Improving Correlation Capture in Generating Imbalanced Data using
Differentially Private Conditional GANs [2.2265840715792735]
We propose DP-CGANS, a differentially private conditional GAN framework consisting of data transformation, sampling, conditioning, and networks training to generate realistic and privacy-preserving data.
We extensively evaluate our model with state-of-the-art generative models on three public datasets and two real-world personal health datasets in terms of statistical similarity, machine learning performance, and privacy measurement.
arXiv Detail & Related papers (2022-06-28T06:47:27Z) - FedH2L: Federated Learning with Model and Statistical Heterogeneity [75.61234545520611]
Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy.
We introduce FedH2L, which is agnostic to both the model architecture and robust to different data distributions across participants.
In contrast to approaches sharing parameters or gradients, FedH2L relies on mutual distillation, exchanging only posteriors on a shared seed set between participants in a decentralized manner.
arXiv Detail & Related papers (2021-01-27T10:10:18Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z) - Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles.
An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.