Building a large synthetic population from Australian census data
- URL: http://arxiv.org/abs/2008.11660v1
- Date: Tue, 18 Aug 2020 05:38:15 GMT
- Title: Building a large synthetic population from Australian census data
- Authors: Bhagya N. Wickramasinghe, Dhirendra Singh and Lin Padgham
- Abstract summary: We present work on creating a synthetic population from census data for Australia, applied to the greater Melbourne region.
We use a sample-free approach to population synthesis that does not rely on a disaggregate sample from the original population.
Our algorithm is efficient in that it can create the synthetic population for Melbourne comprising 4.5 million persons in 1.8 million households within three minutes on a modern computer.
- Score: 2.707154152696381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present work on creating a synthetic population from census data for
Australia, applied to the greater Melbourne region. We use a sample-free
approach to population synthesis that does not rely on a disaggregate sample
from the original population. The inputs for our algorithm are joint marginal
distributions from census of desired person-level and household-level
attributes, and outputs are a set of comma-separated-value (.csv) files
containing the full synthetic population of unique individuals in households;
with age, gender, relationship status, household type, and size, matched to
census data. Our algorithm is efficient in that it can create the synthetic
population for Melbourne comprising 4.5 million persons in 1.8 million
households within three minutes on a modern computer. Code for the algorithm is
hosted on GitHub.
Related papers
- A Deep Generative Framework for Joint Households and Individuals Population Synthesis [0.562479170374811]
We propose a deep generative framework to generate a synthetic population with household-individual and individual-individual relationships.
Results for an application in Delaware, USA demonstrate the ability to ensure the realism of generated household-individual records.
arXiv Detail & Related papers (2024-06-30T23:01:58Z) - Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity.
Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z) - Synthpop++: A Hybrid Framework for Generating A Country-scale Synthetic Population [0.680303951699936]
Population censuses are costly, time-consuming, and may also raise privacy concerns.
We introduce SynthPop++, which can combine data from multiple real-world surveys to produce a real-scale synthetic population.
Our experimental results show that synthetic population can realistically simulate the population for various administrative units of India.
arXiv Detail & Related papers (2023-04-24T17:27:56Z) - Synthcity: facilitating innovative use cases of synthetic data in
different data modalities [86.52703093858631]
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation.
Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data.
arXiv Detail & Related papers (2023-01-18T14:49:54Z) - Generating Synthetic Population [0.680303951699936]
We provide a method to generate synthetic population at various administrative levels for a country like India.
This synthetic population is created using machine learning and statistical methods applied to survey data such as Census of India 2011, IHDS-II, NSS-68th round, GPW etc.
arXiv Detail & Related papers (2022-09-20T19:31:39Z) - So2Sat POP -- A Curated Benchmark Data Set for Population Estimation
from Space on a Continental Scale [11.38584315242023]
We provide a comprehensive data set for population estimation in 98 European cities.
The data set comprises a digital elevation model, local climate zone, land use proportions, nighttime lights in combination with multi-spectral Sentinel-2 imagery, and data from the Open Street Map initiative.
arXiv Detail & Related papers (2022-04-07T07:30:43Z) - Sketch and Scale: Geo-distributed tSNE and UMAP [75.44887265789056]
Running machine learning analytics over geographically distributed datasets is a rapidly arising problem.
We introduce a novel framework: Sketch and Scale (SnS)
It leverages a Count Sketch data structure to compress the data on the edge nodes, aggregates the reduced size sketches on the master node, and runs vanilla tSNE or UMAP on the summary.
We show this technique to be fully parallel, scale linearly in time, logarithmically in memory, and communication, making it possible to analyze datasets with many millions, potentially billions of data points, spread across several data centers around the globe.
arXiv Detail & Related papers (2020-11-11T22:32:21Z) - A deep learning classifier for local ancestry inference [63.8376359764052]
Local ancestry inference identifies the ancestry of each segment of an individual's genome.
We develop a new LAI tool using a deep convolutional neural network with an encoder-decoder architecture.
We show that our model is able to learn admixture as a zero-shot task, yielding ancestry assignments that are nearly as accurate as those from the existing gold standard tool, RFMix.
arXiv Detail & Related papers (2020-11-04T00:42:01Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.