Generating Synthetic Population
- URL: http://arxiv.org/abs/2209.09961v2
- Date: Thu, 16 May 2024 11:06:04 GMT
- Title: Generating Synthetic Population
- Authors: Bhavesh Neekhra, Kshitij Kapoor, Debayan Gupta,
- Abstract summary: We provide a method to generate synthetic population at various administrative levels for a country like India.
This synthetic population is created using machine learning and statistical methods applied to survey data such as Census of India 2011, IHDS-II, NSS-68th round, GPW etc.
- Score: 0.680303951699936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we provide a method to generate synthetic population at various administrative levels for a country like India. This synthetic population is created using machine learning and statistical methods applied to survey data such as Census of India 2011, IHDS-II, NSS-68th round, GPW etc. The synthetic population defines individuals in the population with characteristics such as age, gender, height, weight, home and work location, household structure, preexisting health conditions, socio-economical status, and employment. We used the proposed method to generate the synthetic population for various districts of India. We also compare this synthetic population with source data using various metrics. The experiment results show that the synthetic data can realistically simulate the population for various districts of India.
Related papers
- A Deep Generative Framework for Joint Households and Individuals Population Synthesis [0.562479170374811]
We propose a deep generative framework to generate a synthetic population with household-individual and individual-individual relationships.
Results for an application in Delaware, USA demonstrate the ability to ensure the realism of generated household-individual records.
arXiv Detail & Related papers (2024-06-30T23:01:58Z) - Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity.
Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z) - Synthpop++: A Hybrid Framework for Generating A Country-scale Synthetic Population [0.680303951699936]
Population censuses are costly, time-consuming, and may also raise privacy concerns.
We introduce SynthPop++, which can combine data from multiple real-world surveys to produce a real-scale synthetic population.
Our experimental results show that synthetic population can realistically simulate the population for various administrative units of India.
arXiv Detail & Related papers (2023-04-24T17:27:56Z) - Copula-based transferable models for synthetic population generation [1.370096215615823]
Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents.
Traditional methods, often reliant on target population samples, face limitations due to high costs and small sample sizes.
We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known.
arXiv Detail & Related papers (2023-02-17T23:58:14Z) - Synthcity: facilitating innovative use cases of synthetic data in
different data modalities [86.52703093858631]
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation.
Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data.
arXiv Detail & Related papers (2023-01-18T14:49:54Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - So2Sat POP -- A Curated Benchmark Data Set for Population Estimation
from Space on a Continental Scale [11.38584315242023]
We provide a comprehensive data set for population estimation in 98 European cities.
The data set comprises a digital elevation model, local climate zone, land use proportions, nighttime lights in combination with multi-spectral Sentinel-2 imagery, and data from the Open Street Map initiative.
arXiv Detail & Related papers (2022-04-07T07:30:43Z) - JKOnet: Proximal Optimal Transport Modeling of Population Dynamics [69.89192135800143]
We propose a neural architecture that combines an energy model on measures, with (small) optimal displacements solved with input convex neural networks (ICNN)
We demonstrate the applicability of our model to explain and predict population dynamics.
arXiv Detail & Related papers (2021-06-11T12:30:43Z) - Methodological Foundation of a Numerical Taxonomy of Urban Form [62.997667081978825]
We present a method for numerical taxonomy of urban form derived from biological systematics.
We derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form.
After framing and presenting the method, we test it on two cities - Prague and Amsterdam.
arXiv Detail & Related papers (2021-04-30T12:47:52Z) - Building a large synthetic population from Australian census data [2.707154152696381]
We present work on creating a synthetic population from census data for Australia, applied to the greater Melbourne region.
We use a sample-free approach to population synthesis that does not rely on a disaggregate sample from the original population.
Our algorithm is efficient in that it can create the synthetic population for Melbourne comprising 4.5 million persons in 1.8 million households within three minutes on a modern computer.
arXiv Detail & Related papers (2020-08-18T05:38:15Z) - Shape of synth to come: Why we should use synthetic data for English
surface realization [72.62356061765976]
In the 2018 shared task there was very little difference in the absolute performance of systems trained with and without additional, synthetically created data.
We show, in experiments on the English 2018 dataset, that the use of synthetic data can have a substantial positive effect.
We argue that its use should be encouraged rather than prohibited so that future research efforts continue to explore systems that can take advantage of such data.
arXiv Detail & Related papers (2020-05-06T10:00:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.