A systematic machine learning approach to measure and assess biases in mobile phone population data
- URL: http://arxiv.org/abs/2509.02603v1
- Date: Fri, 29 Aug 2025 21:25:30 GMT
- Title: A systematic machine learning approach to measure and assess biases in mobile phone population data
- Authors: Carmen Cabrera, Francisco Rowe,
- Abstract summary: We develop and implement a framework to quantify coverage bias in aggregated mobile phone application data.<n>We show that mobile phone data consistently achieve higher population coverage than major national surveys.<n>Our findings establish a foundation for bias assessment standards in mobile phone data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Traditional sources of population data, such as censuses and surveys, are costly, infrequent, and often unavailable in crisis-affected regions. Mobile phone application data offer near real-time, high-resolution insights into population distribution, but their utility is undermined by unequal access to and use of digital technologies, creating biases that threaten representativeness. Despite growing recognition of these issues, there is still no standard framework to measure and explain such biases, limiting the reliability of digital traces for research and policy. We develop and implement a systematic, replicable framework to quantify coverage bias in aggregated mobile phone application data without requiring individual-level demographic attributes. The approach combines a transparent indicator of population coverage with explainable machine learning to identify contextual drivers of spatial bias. Using four datasets for the United Kingdom benchmarked against the 2021 census, we show that mobile phone data consistently achieve higher population coverage than major national surveys, but substantial biases persist across data sources and subnational areas. Coverage bias is strongly associated with demographic, socioeconomic, and geographic features, often in complex nonlinear ways. Contrary to common assumptions, multi-application datasets do not necessarily reduce bias compared to single-app sources. Our findings establish a foundation for bias assessment standards in mobile phone data, offering practical tools for researchers, statistical agencies, and policymakers to harness these datasets responsibly and equitably.
Related papers
- Data Bias in Human Mobility is a Universal Phenomenon but is Highly Location-specific [0.0]
We study data production', quantifying not only whether individuals are represented in big digital datasets, but also how they are represented in terms of how much data they produce.<n>We study GPS mobility data collected from anonymized smartphones for ten major US cities and find that data points can be more unequally distributed between users than wealth.<n>We build models to predict the number of data points we can expect to be produced by the composition of demographic groups living in census tracts, and find strong effects of wealth, ethnicity, and education on data production.
arXiv Detail & Related papers (2025-07-31T20:19:50Z) - Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - BiasBuster: a Neural Approach for Accurate Estimation of Population
Statistics using Biased Location Data [6.077198822448429]
We show that statistical debiasing, although in some cases useful, often fails to improve accuracy.
We then propose BiasBuster, a neural network approach that utilizes the correlations between population statistics and location characteristics to provide accurate estimates of population statistics.
arXiv Detail & Related papers (2024-02-17T16:16:24Z) - DSAP: Analyzing Bias Through Demographic Comparison of Datasets [4.8741052091630985]
We propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of two datasets.
DSAP can be deployed in three key applications: to detect and characterize demographic blind spots and bias issues across datasets, to measure dataset demographic bias in single datasets, and to measure dataset demographic shift in deployment scenarios.
An essential feature of DSAP is its ability to robustly analyze datasets without explicit demographic labels, offering simplicity and interpretability for a wide range of situations.
arXiv Detail & Related papers (2023-12-22T11:51:20Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Biases in human mobility data impact epidemic modeling [0.0]
We identify two types of bias caused by unequal access to, and unequal usage of mobile phones.
We find evidence for data generation bias in all examined datasets in that high-wealth individuals are overrepresented.
To mitigate the skew, we present a framework to debias data and show how simple techniques can be used to increase representativeness.
arXiv Detail & Related papers (2021-12-23T13:20:54Z) - Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture.
We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z) - Towards Measuring Bias in Image Classification [61.802949761385]
Convolutional Neural Networks (CNN) have become state-of-the-art for the main computer vision tasks.
However, due to the complex structure their decisions are hard to understand which limits their use in some context of the industrial world.
We present a systematic approach to uncover data bias by means of attribution maps.
arXiv Detail & Related papers (2021-07-01T10:50:39Z) - Leveraging Administrative Data for Bias Audits: Assessing Disparate
Coverage with Mobility Data for COVID-19 Policy [61.60099467888073]
We show how linking administrative data can enable auditing mobility data for bias.
We show that older and non-white voters are less likely to be captured by mobility data.
We show that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.
arXiv Detail & Related papers (2020-11-14T02:04:14Z) - Measuring Social Biases of Crowd Workers using Counterfactual Queries [84.10721065676913]
Social biases based on gender, race, etc. have been shown to pollute machine learning (ML) pipeline predominantly via biased training datasets.
Crowdsourcing, a popular cost-effective measure to gather labeled training datasets, is not immune to the inherent social biases of crowd workers.
We propose a new method based on counterfactual fairness to quantify the degree of inherent social bias in each crowd worker.
arXiv Detail & Related papers (2020-04-04T21:41:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.