Related papers: Biases in human mobility data impact epidemic modeling

Biases in human mobility data impact epidemic modeling

URL: http://arxiv.org/abs/2112.12521v1
Date: Thu, 23 Dec 2021 13:20:54 GMT
Title: Biases in human mobility data impact epidemic modeling
Authors: Frank Schlosser, Vedran Sekara, Dirk Brockmann, Manuel Garcia-Herranz
Abstract summary: We identify two types of bias caused by unequal access to, and unequal usage of mobile phones. We find evidence for data generation bias in all examined datasets in that high-wealth individuals are overrepresented. To mitigate the skew, we present a framework to debias data and show how simple techniques can be used to increase representativeness.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale human mobility data is a key resource in data-driven policy making and across many scientific fields. Most recently, mobility data was extensively used during the COVID-19 pandemic to study the effects of governmental policies and to inform epidemic models. Large-scale mobility is often measured using digital tools such as mobile phones. However, it remains an open question how truthfully these digital proxies represent the actual travel behavior of the general population. Here, we examine mobility datasets from multiple countries and identify two fundamentally different types of bias caused by unequal access to, and unequal usage of mobile phones. We introduce the concept of data generation bias, a previously overlooked type of bias, which is present when the amount of data that an individual produces influences their representation in the dataset. We find evidence for data generation bias in all examined datasets in that high-wealth individuals are overrepresented, with the richest 20% contributing over 50% of all recorded trips, substantially skewing the datasets. This inequality is consequential, as we find mobility patterns of different wealth groups to be structurally different, where the mobility networks of high-wealth users are denser and contain more long-range connections. To mitigate the skew, we present a framework to debias data and show how simple techniques can be used to increase representativeness. Using our approach we show how biases can severely impact outcomes of dynamic processes such as epidemic simulations, where biased data incorrectly estimates the severity and speed of disease transmission. Overall, we show that a failure to account for biases can have detrimental effects on the results of studies and urge researchers and practitioners to account for data-fairness in all future studies of human mobility.

Related papers

Data Bias in Human Mobility is a Universal Phenomenon but is Highly Location-specific [0.0]
We study data production', quantifying not only whether individuals are represented in big digital datasets, but also how they are represented in terms of how much data they produce.<n>We study GPS mobility data collected from anonymized smartphones for ten major US cities and find that data points can be more unequally distributed between users than wealth.<n>We build models to predict the number of data points we can expect to be produced by the composition of demographic groups living in census tracts, and find strong effects of wealth, ethnicity, and education on data production.
arXiv Detail & Related papers (2025-07-31T20:19:50Z)
Biased Heritage: How Datasets Shape Models in Facial Expression Recognition [13.77824359359967]
We study bias propagation from datasets to trained models in image-based Facial Expression Recognition systems. We introduce new bias metrics specifically designed for multiclass problems with multiple demographic groups. Our findings suggest that preventing emotion-specific demographic patterns should be prioritized over general demographic balance in FER datasets.
arXiv Detail & Related papers (2025-03-05T12:25:22Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs. Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models? [29.71939692883025]
We investigate the effects of generated data on image classification tasks, with a specific focus on bias. Hundreds of experiments are conducted on Colorized MNIST, CIFAR-20/100, and Hard ImageNet datasets. Our findings contribute to the ongoing debate on the implications of synthetic data for fairness in real-world applications.
arXiv Detail & Related papers (2024-10-14T05:07:06Z)
Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios. Existing debiasing methods suffer from high costs in bias labeling or model re-training. We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z)
Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age. A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data. In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z)
Analyzing Bias in Diffusion-based Face Generation Models [75.80072686374564]
Diffusion models are increasingly popular in synthetic data generation and image editing applications. We investigate the presence of bias in diffusion-based face generation models with respect to attributes such as gender, race, and age. We examine how dataset size affects the attribute composition and perceptual quality of both diffusion and Generative Adversarial Network (GAN) based face generation models.
arXiv Detail & Related papers (2023-05-10T18:22:31Z)
Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm. We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift. We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z)
A fairness assessment of mobility-based COVID-19 case prediction models [0.0]
We tested the hypothesis that bias in the mobility data used to train the predictive models might lead to unfairly less accurate predictions for certain demographic groups. Specifically, the models tend to favor large, highly educated, wealthy young, urban, and non-black-dominated counties.
arXiv Detail & Related papers (2022-10-08T03:43:51Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Data Representativeness in Accessibility Datasets: A Meta-Analysis [7.6597163467929805]
We review datasets sourced by people with disabilities and older adults. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. We hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.
arXiv Detail & Related papers (2022-07-16T23:32:19Z)
Assessing Demographic Bias Transfer from Dataset to Model: A Case Study in Facial Expression Recognition [1.5340540198612824]
Two metrics focus on the representational and stereotypical bias of the dataset, and the third one on the residual bias of the trained model. We demonstrate the usefulness of the metrics by applying them to a FER problem based on the popular Affectnet dataset.
arXiv Detail & Related papers (2022-05-20T09:40:42Z)
Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy [3.3918638314432936]
We present a system for generating synthetic mobility data using a deep recurrent neural network (RNN) The system takes a population distribution as input and generates mobility traces for a corresponding synthetic population. We show the generated mobility data retain the characteristics of the real data, while varying from the real data at the individual level.
arXiv Detail & Related papers (2022-01-04T13:58:22Z)
Leveraging Administrative Data for Bias Audits: Assessing Disparate Coverage with Mobility Data for COVID-19 Policy [61.60099467888073]
We show how linking administrative data can enable auditing mobility data for bias. We show that older and non-white voters are less likely to be captured by mobility data. We show that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.
arXiv Detail & Related papers (2020-11-14T02:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.