Related papers: Data Bias in Human Mobility is a Universal Phenomenon but is Highly Location-specific

Data Bias in Human Mobility is a Universal Phenomenon but is Highly Location-specific

URL: http://arxiv.org/abs/2508.00149v1
Date: Thu, 31 Jul 2025 20:19:50 GMT
Title: Data Bias in Human Mobility is a Universal Phenomenon but is Highly Location-specific
Authors: Katinka den Nijs, Elisa Omodei, Vedran Sekara,
Abstract summary: We study data production', quantifying not only whether individuals are represented in big digital datasets, but also how they are represented in terms of how much data they produce.<n>We study GPS mobility data collected from anonymized smartphones for ten major US cities and find that data points can be more unequally distributed between users than wealth.<n>We build models to predict the number of data points we can expect to be produced by the composition of demographic groups living in census tracts, and find strong effects of wealth, ethnicity, and education on data production.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale human mobility datasets play increasingly critical roles in many algorithmic systems, business processes and policy decisions. Unfortunately there has been little focus on understanding bias and other fundamental shortcomings of the datasets and how they impact downstream analyses and prediction tasks. In this work, we study `data production', quantifying not only whether individuals are represented in big digital datasets, but also how they are represented in terms of how much data they produce. We study GPS mobility data collected from anonymized smartphones for ten major US cities and find that data points can be more unequally distributed between users than wealth. We build models to predict the number of data points we can expect to be produced by the composition of demographic groups living in census tracts, and find strong effects of wealth, ethnicity, and education on data production. While we find that bias is a universal phenomenon, occurring in all cities, we further find that each city suffers from its own manifestation of it, and that location-specific models are required to model bias for each city. This work raises serious questions about general approaches to debias human mobility data and urges further research.

Related papers

Enriching Datasets with Demographics through Large Language Models: What's in a Name? [5.871504332441324]
Large Language Models (LLMs) can perform as well as, if not better than, bespoke models trained on specialized data. We apply these LLMs to a variety of datasets, including a real-life, unlabelled dataset of licensed financial professionals in Hong Kong.
arXiv Detail & Related papers (2024-09-17T18:40:49Z)
Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks. It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection. Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z)
DSAP: Analyzing Bias Through Demographic Comparison of Datasets [4.8741052091630985]
We propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of two datasets. DSAP can be deployed in three key applications: to detect and characterize demographic blind spots and bias issues across datasets, to measure dataset demographic bias in single datasets, and to measure dataset demographic shift in deployment scenarios. An essential feature of DSAP is its ability to robustly analyze datasets without explicit demographic labels, offering simplicity and interpretability for a wide range of situations.
arXiv Detail & Related papers (2023-12-22T11:51:20Z)
Assessing Demographic Bias Transfer from Dataset to Model: A Case Study in Facial Expression Recognition [1.5340540198612824]
Two metrics focus on the representational and stereotypical bias of the dataset, and the third one on the residual bias of the trained model. We demonstrate the usefulness of the metrics by applying them to a FER problem based on the popular Affectnet dataset.
arXiv Detail & Related papers (2022-05-20T09:40:42Z)
Pseudo-PFLOW: Development of nationwide synthetic open dataset for people movement based on limited travel survey and open statistical data [4.243926243206826]
People flow data are utilized in diverse fields such as urban and commercial planning and disaster management. This study developed pseudo-people-flow data covering all of Japan by combining public statistical and travel survey data.
arXiv Detail & Related papers (2022-05-02T05:13:53Z)
StyleGAN-Human: A Data-Centric Odyssey of Human Generation [96.7080874757475]
This work takes a data-centric perspective and investigates multiple critical aspects in "data engineering" We collect and annotate a large-scale human image dataset with over 230K samples capturing diverse poses and textures. We rigorously investigate three essential factors in data engineering for StyleGAN-based human generation, namely data size, data distribution, and data alignment.
arXiv Detail & Related papers (2022-04-25T17:55:08Z)
Representation Bias in Data: A Survey on Identification and Resolution Techniques [26.142021257838564]
Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods. This paper reviews the literature on identifying and resolving representation bias as a feature of a data set, independent of how consumed later.
arXiv Detail & Related papers (2022-03-22T16:30:22Z)
Biases in human mobility data impact epidemic modeling [0.0]
We identify two types of bias caused by unequal access to, and unequal usage of mobile phones. We find evidence for data generation bias in all examined datasets in that high-wealth individuals are overrepresented. To mitigate the skew, we present a framework to debias data and show how simple techniques can be used to increase representativeness.
arXiv Detail & Related papers (2021-12-23T13:20:54Z)
Towards Measuring Bias in Image Classification [61.802949761385]
Convolutional Neural Networks (CNN) have become state-of-the-art for the main computer vision tasks. However, due to the complex structure their decisions are hard to understand which limits their use in some context of the industrial world. We present a systematic approach to uncover data bias by means of attribution maps.
arXiv Detail & Related papers (2021-07-01T10:50:39Z)
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets. We leverage a largely ignored source of information: the behavior of the model on individual instances during training. Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z)
Urban Sensing based on Mobile Phone Data: Approaches, Applications and Challenges [67.71975391801257]
Much concern in mobile data analysis is related to human beings and their behaviours. This work aims to review the methods and techniques that have been implemented to discover knowledge from mobile phone data.
arXiv Detail & Related papers (2020-08-29T15:14:03Z)
REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset. It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.