Related papers: PopSim: An Individual-level Population Simulator for Equitable Allocation of City Resources

Related papers

Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z)
A systematic machine learning approach to measure and assess biases in mobile phone population data [0.0]
We develop and implement a framework to quantify coverage bias in aggregated mobile phone application data.<n>We show that mobile phone data consistently achieve higher population coverage than major national surveys.<n>Our findings establish a foundation for bias assessment standards in mobile phone data.
arXiv Detail & Related papers (2025-08-29T21:25:30Z)
Data Bias in Human Mobility is a Universal Phenomenon but is Highly Location-specific [0.0]
We study data production', quantifying not only whether individuals are represented in big digital datasets, but also how they are represented in terms of how much data they produce.<n>We study GPS mobility data collected from anonymized smartphones for ten major US cities and find that data points can be more unequally distributed between users than wealth.<n>We build models to predict the number of data points we can expect to be produced by the composition of demographic groups living in census tracts, and find strong effects of wealth, ethnicity, and education on data production.
arXiv Detail & Related papers (2025-07-31T20:19:50Z)
The NetMob25 Dataset: A High-resolution Multi-layered View of Individual Mobility in Greater Paris Region [64.30214722988666]
This paper describes the survey design, collection protocol, processing methodology, and characteristics of the released dataset.<n>The dataset includes three components: (i) an Individuals database describing demographic, socioeconomic, and household characteristics; (ii) a Trips database with over 80,000 annotated displacements including timestamps, transport modes, and trip purposes; and (iii) a Raw GPS Traces database comprising about 500 million high-frequency points.
arXiv Detail & Related papers (2025-06-06T09:22:21Z)
You Don't Have to Live Next to Me: Towards Demobilizing Individualistic Bias in Computational Approaches to Urban Segregation [0.0]
The global surge in social inequalities is one of the most pressing issues of our times.<n>The expression of social inequalities at city scale gives rise to urban segregation.<n>The increasing popularity of Big Data and computational models has inspired a growing number of computational studies.
arXiv Detail & Related papers (2025-05-03T14:15:27Z)
Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs) We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z)
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
We introduce VLBiasBench, a benchmark to evaluate biases in Large Vision-Language Models (LVLMs) VLBiasBench features a dataset that covers nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status, as well as two intersectional bias categories: race x gender and race x social economic status. We conduct extensive evaluations on 15 open-source models as well as two advanced closed-source models, yielding new insights into the biases present in these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z)
Synthpop++: A Hybrid Framework for Generating A Country-scale Synthetic Population [0.680303951699936]
Population censuses are costly, time-consuming, and may also raise privacy concerns. We introduce SynthPop++, which can combine data from multiple real-world surveys to produce a real-scale synthetic population. Our experimental results show that synthetic population can realistically simulate the population for various administrative units of India.
arXiv Detail & Related papers (2023-04-24T17:27:56Z)
A deep learning framework to generate realistic population and mobility data [5.180648702293017]
Census and Household Travel Survey datasets are regularly collected from households and individuals. These datasets often represent a limited sample of the population due to privacy concerns or are given aggregated. We propose a framework to generate a synthetic population that includes both socioeconomic features (e.g., age, sex, industry) and trip chains (i.e., activity locations)
arXiv Detail & Related papers (2022-11-14T14:05:09Z)
Releasing survey microdata with exact cluster locations and additional privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards. Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z)
So2Sat POP -- A Curated Benchmark Data Set for Population Estimation from Space on a Continental Scale [11.38584315242023]
We provide a comprehensive data set for population estimation in 98 European cities. The data set comprises a digital elevation model, local climate zone, land use proportions, nighttime lights in combination with multi-spectral Sentinel-2 imagery, and data from the Open Street Map initiative.
arXiv Detail & Related papers (2022-04-07T07:30:43Z)
Census-Independent Population Estimation using Representation Learning [0.5735035463793007]
Census-independent population estimation approaches using alternative data sources have shown promise in providing frequent and reliable population estimates locally. We explore recent representation learning approaches, and assess the transferability of representations to population estimation in Mozambique. Using representation learning reduces required human supervision, since features are extracted automatically. We compare the resulting population estimates to existing population products from GRID3, Facebook (HRSL) and WorldPop.
arXiv Detail & Related papers (2021-10-06T15:13:36Z)
Methodological Foundation of a Numerical Taxonomy of Urban Form [62.997667081978825]
We present a method for numerical taxonomy of urban form derived from biological systematics. We derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form. After framing and presenting the method, we test it on two cities - Prague and Amsterdam.
arXiv Detail & Related papers (2021-04-30T12:47:52Z)
Leveraging Administrative Data for Bias Audits: Assessing Disparate Coverage with Mobility Data for COVID-19 Policy [61.60099467888073]
We show how linking administrative data can enable auditing mobility data for bias. We show that older and non-white voters are less likely to be captured by mobility data. We show that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.
arXiv Detail & Related papers (2020-11-14T02:04:14Z)
Magnify Your Population: Statistical Downscaling to Augment the Spatial Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes. For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions. As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
Measuring Spatial Subdivisions in Urban Mobility with Mobile Phone Data [58.720142291102135]
By 2050 two thirds of the world population will reside in urban areas. This growth is faster and more complex than the ability of cities to measure and plan for their sustainability. To understand what makes a city inclusive for all, we define a methodology to identify and characterize spatial subdivisions.
arXiv Detail & Related papers (2020-02-20T14:37:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.