Inferring fine-grained migration patterns across the United States
- URL: http://arxiv.org/abs/2503.20989v2
- Date: Fri, 06 Jun 2025 17:38:22 GMT
- Title: Inferring fine-grained migration patterns across the United States
- Authors: Gabriel Agostini, Rachel Young, Maria Fitzpatrick, Nikhil Garg, Emma Pierson,
- Abstract summary: We develop a scalable iterative-proportional-fitting based method that reconciles high-resolution but biased proprietary data with low-resolution but more reliable Census data.<n>We produce MIGRATE, a dataset of annual migration matrices from 2010 - 2019 that captures flows between 47.4 billion pairs of Census Block Groups.<n>These estimates are highly correlated with external ground-truth datasets, and improve accuracy and reduce bias relative to raw proprietary data.
- Score: 1.6594124470436404
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-grained migration data illuminate important demographic, environmental, and health phenomena. However, migration datasets within the United States remain lacking: publicly available Census data are neither spatially nor temporally granular, and proprietary data have higher resolution but demographic and other biases. To address these limitations, we develop a scalable iterative-proportional-fitting based method that reconciles high-resolution but biased proprietary data with low-resolution but more reliable Census data. We apply this method to produce MIGRATE, a dataset of annual migration matrices from 2010 - 2019 that captures flows between 47.4 billion pairs of Census Block Groups -- about four thousand times more granular than publicly available data. These estimates are highly correlated with external ground-truth datasets, and improve accuracy and reduce bias relative to raw proprietary data. We use MIGRATE to analyze both national and local migration patterns. Nationally, we document temporal and demographic variation in homophily, upward mobility, and moving distance: for example, we find that people are increasingly likely to move to top-income-quartile CBGs and identify racial disparities in upward mobility. We also show that MIGRATE can illuminate important local migration patterns, including out-migration in response to California wildfires, that are invisible in coarser previous datasets. We publicly release MIGRATE to provide a resource for migration research in the social, environmental, and health sciences.
Related papers
- Data Pruning in Generative Diffusion Models [2.0111637969968]
Generative models aim to estimate the underlying distribution of the data, so presumably they should benefit from larger datasets.
We show that eliminating redundant or noisy data in large datasets is beneficial particularly when done strategically.
arXiv Detail & Related papers (2024-11-19T14:13:25Z) - Labor Migration Modeling through Large-scale Job Query Data [36.87413768190629]
We propose a deep learning-based spatial-temporal labor migration analysis framework, DHG-SIL, by leveraging large-scale job query data.
Specifically, we first acquire labor migration intention as a proxy of labor migration via job queries from one of the world's largest search engines.
We introduce four interpretable variables to quantify city migration properties, which are co-optimized with city representations.
arXiv Detail & Related papers (2024-10-03T16:24:14Z) - Enriching Datasets with Demographics through Large Language Models: What's in a Name? [5.871504332441324]
Large Language Models (LLMs) can perform as well as, if not better than, bespoke models trained on specialized data.
We apply these LLMs to a variety of datasets, including a real-life, unlabelled dataset of licensed financial professionals in Hong Kong.
arXiv Detail & Related papers (2024-09-17T18:40:49Z) - A Highly Granular Temporary Migration Dataset Derived From Mobile Phone Data in Senegal [0.0]
This article introduces a detailed and open-access dataset that leverages mobile phone data to capture temporary migration in Senegal.
The article presents a suite of methodological tools that not only include algorithmic methods for the detection of temporary migration events in digital traces, but also addresses key challenges in aggregating individual trajectories into coherent migration statistics.
arXiv Detail & Related papers (2024-06-21T14:58:28Z) - Combining Twitter and Mobile Phone Data to Observe Border-Rush: The Turkish-European Border Opening [2.5693085674985117]
Following Turkey's 2020 decision to revoke border controls, many individuals journeyed towards the Greek, Bulgarian, and Turkish borders.
However, the lack of verifiable statistics on irregular migration and discrepancies between media reports and actual migration patterns require further exploration.
This study is to bridge this knowledge gap by harnessing novel data sources, specifically mobile phone and Twitter data.
arXiv Detail & Related papers (2024-05-21T09:51:15Z) - Synthetic Census Data Generation via Multidimensional Multiset Sum [7.900694093691988]
We provide tools to generate synthetic microdata solely from published Census statistics.
We show that our methods work well in practice, and we offer theoretical arguments to explain our performance.
arXiv Detail & Related papers (2024-04-15T19:06:37Z) - Trust your Good Friends: Source-free Domain Adaptation by Reciprocal
Neighborhood Clustering [50.46892302138662]
We address the source-free domain adaptation problem, where the source pretrained model is adapted to the target domain in the absence of source data.
Our method is based on the observation that target data, which might not align with the source domain classifier, still forms clear clusters.
We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood.
arXiv Detail & Related papers (2023-09-01T15:31:18Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy
Protection Methods [0.0]
The U.S. Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information.
We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems.
TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping.
arXiv Detail & Related papers (2023-06-13T03:30:19Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Estimating Latent Population Flows from Aggregated Data via Inversing
Multi-Marginal Optimal Transport [57.16851632525864]
We study the problem of estimating latent population flows from aggregated count data.
This problem arises when individual trajectories are not available due to privacy issues or measurement fidelity.
We propose to estimate the transition flows from aggregated data by learning the cost functions of the MOT framework.
arXiv Detail & Related papers (2022-12-30T03:03:23Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Releasing survey microdata with exact cluster locations and additional
privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards.
Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z) - Leveraging Mobile Phone Data for Migration Flows [5.0161988361764775]
Statistics on migration flows are often derived from census data, which suffer from intrinsic limitations.
Alternative data sources, such as surveys and field observations, also suffer from reliability, costs, and scale limitations.
The ubiquity of mobile phones enables an accurate and efficient collection of up-to-date data related to migration.
arXiv Detail & Related papers (2021-05-31T13:41:47Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Leveraging Administrative Data for Bias Audits: Assessing Disparate
Coverage with Mobility Data for COVID-19 Policy [61.60099467888073]
We show how linking administrative data can enable auditing mobility data for bias.
We show that older and non-white voters are less likely to be captured by mobility data.
We show that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.
arXiv Detail & Related papers (2020-11-14T02:04:14Z) - Forecasting asylum-related migration flows with machine learning and
data at scale [0.0]
We show that adaptive machine learning algorithms can effectively forecast asylum-related migration flows.
We exploit three tiers of data - geolocated events and internet searches in countries of origin, detections of irregular crossings at the EU border, and asylum recognition rates in countries of destination.
arXiv Detail & Related papers (2020-11-09T11:31:17Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z) - Enhancing Facial Data Diversity with Style-based Face Aging [59.984134070735934]
In particular, face datasets are typically biased in terms of attributes such as gender, age, and race.
We propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns.
We show that the proposed method outperforms state-of-the-art algorithms for age transfer.
arXiv Detail & Related papers (2020-06-06T21:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.