Related papers: Generating Spatial Synthetic Populations Using Wasserstein Generative Adversarial Network: A Case Study with EU-SILC Data for Helsinki and Thessaloniki

Generating Spatial Synthetic Populations Using Wasserstein Generative Adversarial Network: A Case Study with EU-SILC Data for Helsinki and Thessaloniki

URL: http://arxiv.org/abs/2501.16080v1
Date: Mon, 27 Jan 2025 14:29:07 GMT
Title: Generating Spatial Synthetic Populations Using Wasserstein Generative Adversarial Network: A Case Study with EU-SILC Data for Helsinki and Thessaloniki
Authors: Vanja Falck,
Abstract summary: The Wasserstein Generative Adversarial Network, trained on census data like EU-SILC, can create robust synthetic populations.<n>The increased access to high-quality micro-data has sparked interest in synthetic populations.<n>This study uses national data from Finland and Greece for Helsinki and Thessaloniki to explore balanced spatial synthetic population generation.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Using agent-based social simulations can enhance our understanding of urban planning, public health, and economic forecasting. Realistic synthetic populations with numerous attributes strengthen these simulations. The Wasserstein Generative Adversarial Network, trained on census data like EU-SILC, can create robust synthetic populations. These methods, aided by external statistics or EU-SILC weights, generate spatial synthetic populations for agent-based models. The increased access to high-quality micro-data has sparked interest in synthetic populations, which preserve demographic profiles and analytical strength while ensuring privacy and preventing discrimination. This study uses national data from Finland and Greece for Helsinki and Thessaloniki to explore balanced spatial synthetic population generation. Results show challenges related to balancing data with or without aggregated statistics for the target population and the general under-representation of fringe profiles by deep generative methods. The latter can lead to discrimination in agent-based simulations.

Related papers

SemaPop: Semantic-Persona Conditioned Population Synthesis [7.388951238297018]
This study proposes SemaPop, a semantic-statistical population synthesis model that integrates large language models (LLMs) with generative population modeling.<n>In this study, the framework is instantiated using a Wasserstein GAN with gradient penalty (WGAN-GP) backbone, referred to as SemaPop-GAN.
arXiv Detail & Related papers (2026-02-12T04:44:34Z)
Exact Synthetic Populations for Scalable Societal and Market Modeling [0.0]
We introduce a constraint-programming framework for generating synthetic populations that reproduce target statistics with high precision.<n>We validate the approach on official demographic sources and study the impact of distributional deviations on downstream analyses.
arXiv Detail & Related papers (2025-12-08T08:48:21Z)
Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z)
Generating Feasible and Diverse Synthetic Populations Using Diffusion Models [5.689443449061003]
Population synthesis is a critical task that involves generating synthetic yet realistic representations of populations.<n>Deep generative models can potentially synthesize possible attribute combinations that present in the actual population but do not exist in the sample data.<n>In this study, a novel diffusion model-based population synthesis method is proposed to estimate the underlying joint distribution of a population.
arXiv Detail & Related papers (2025-08-06T03:11:27Z)
MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework [53.82097200295448]
Mean-Field LLM (MF-LLM) is first to incorporate mean field theory into social simulation.<n>MF-LLM models bidirectional interactions between individuals and the population through an iterative process.<n> IB-Tune is a novel fine-tuning method inspired by the Information Bottleneck principle.
arXiv Detail & Related papers (2025-04-30T12:41:51Z)
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction [5.774786149181393]
We analyze how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs) We find that LLM-generated data fails to replicate the variance observed in real-world human responses. In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data.
arXiv Detail & Related papers (2025-02-22T16:25:33Z)
Guided Persona-based AI Surveys: Can we replicate personal mobility preferences at scale using LLMs? [1.7819574476785418]
This study explores the potential of Large Language Models (LLMs) to generate artificial surveys.<n>By leveraging LLMs for synthetic data creation, we aim to address the limitations of traditional survey methods.<n>A novel approach incorporating "Personas" is introduced and compared to five other synthetic survey methods.
arXiv Detail & Related papers (2025-01-20T15:11:03Z)
Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data [104.30479583607918]
2nd FRCSyn-onGoing challenge is based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024.<n>We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition.
arXiv Detail & Related papers (2024-12-02T11:12:01Z)
Agentic Society: Merging skeleton from real world and texture from Large Language Model [4.740886789811429]
This paper explores a novel framework that leverages census data and large language models to generate virtual populations. We show that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments. But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs.
arXiv Detail & Related papers (2024-09-02T08:28:19Z)
A multi-objective combinatorial optimisation framework for large scale hierarchical population synthesis [1.2233362977312945]
In agent-based simulations, synthetic populations of agents are commonly used to represent the structure, behaviour, and interactions of individuals. We propose a multi objective optimisation technique for large scale population synthesis. Our approach supports complex hierarchical structures between individuals and households, is scalable to large populations and achieves minimal contigency table reconstruction error.
arXiv Detail & Related papers (2024-07-03T15:01:12Z)
Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z)
Synthetic location trajectory generation using categorical diffusion models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data. We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z)
Synthpop++: A Hybrid Framework for Generating A Country-scale Synthetic Population [0.680303951699936]
Population censuses are costly, time-consuming, and may also raise privacy concerns. We introduce SynthPop++, which can combine data from multiple real-world surveys to produce a real-scale synthetic population. Our experimental results show that synthetic population can realistically simulate the population for various administrative units of India.
arXiv Detail & Related papers (2023-04-24T17:27:56Z)
Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z)
Copula-based transferable models for synthetic population generation [1.370096215615823]
Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents. Traditional methods, often reliant on target population samples, face limitations due to high costs and small sample sizes. We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known.
arXiv Detail & Related papers (2023-02-17T23:58:14Z)
Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process. We generate a representative as well as fair version of the UCI Adult census data set. We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z)
Magnify Your Population: Statistical Downscaling to Augment the Spatial Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes. For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions. As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.