Mind the Gaps: Auditing and Reducing Group Inequity in Large-Scale Mobility Prediction
- URL: http://arxiv.org/abs/2510.26940v1
- Date: Thu, 30 Oct 2025 18:54:33 GMT
- Title: Mind the Gaps: Auditing and Reducing Group Inequity in Large-Scale Mobility Prediction
- Authors: Ashwin Kumar, Hanyu Zhang, David A. Schweidel, William Yeoh,
- Abstract summary: Next location prediction underpins a growing number of mobility, retail, and public-health applications.<n>In this paper, we audit state-of-the-art mobility prediction models trained on a large-scale dataset.<n>We show a systematic disparity resulting from the underlying dataset, resulting in large differences in accuracy based on location and user groups.
- Score: 9.369284351516358
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Next location prediction underpins a growing number of mobility, retail, and public-health applications, yet its societal impacts remain largely unexplored. In this paper, we audit state-of-the-art mobility prediction models trained on a large-scale dataset, highlighting hidden disparities based on user demographics. Drawing from aggregate census data, we compute the difference in predictive performance on racial and ethnic user groups and show a systematic disparity resulting from the underlying dataset, resulting in large differences in accuracy based on location and user groups. To address this, we propose Fairness-Guided Incremental Sampling (FGIS), a group-aware sampling strategy designed for incremental data collection settings. Because individual-level demographic labels are unavailable, we introduce Size-Aware K-Means (SAKM), a clustering method that partitions users in latent mobility space while enforcing census-derived group proportions. This yields proxy racial labels for the four largest groups in the state: Asian, Black, Hispanic, and White. Built on these labels, our sampling algorithm prioritizes users based on expected performance gains and current group representation. This method incrementally constructs training datasets that reduce demographic performance gaps while preserving overall accuracy. Our method reduces total disparity between groups by up to 40\% with minimal accuracy trade-offs, as evaluated on a state-of-art MetaPath2Vec model and a transformer-encoder model. Improvements are most significant in early sampling stages, highlighting the potential for fairness-aware strategies to deliver meaningful gains even in low-resource settings. Our findings expose structural inequities in mobility prediction pipelines and demonstrate how lightweight, data-centric interventions can improve fairness with little added complexity, especially for low-data applications.
Related papers
- BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation [33.11188827947722]
We propose BRIDGE, a Bias-Reducing Inter-group Data GEneration framework for low-resource assessment settings.<n>We show that BRIDGE effectively reduces prediction bias for high-scoring ELL students while maintaining overall scoring performance.
arXiv Detail & Related papers (2026-02-27T01:11:05Z) - C2AL: Cohort-Contrastive Auxiliary Learning for Large-scale Recommendation Systems [7.548682352355034]
We show how the attention mechanism can play a key role in factorization machines for shared embedding selection.<n>We propose to address this challenge by analyzing the substructures in the dataset and exposing those with strong distributional contrast through auxiliary learning.<n>This approach customizes the learning process of attention layers to preserve mutual information with minority cohorts while improving global performance.
arXiv Detail & Related papers (2025-10-02T17:00:17Z) - Fairness for the People, by the People: Minority Collective Action [50.29077265863936]
Machine learning models often preserve biases present in training data, leading to unfair treatment of certain minority groups.<n>We propose a coordinated minority group strategically relabels its own data to enhance fairness, without altering the firm's training process.<n>Our findings show that a subgroup of the minority can substantially reduce unfairness with a small impact on the overall prediction error.
arXiv Detail & Related papers (2025-08-21T09:09:39Z) - Optimal Transport-based Domain Alignment as a Preprocessing Step for Federated Learning [0.48342038441006796]
Federated learning (FL) is a subfield of machine learning that avoids sharing local data with a central server.<n>In FL, fusing locally-trained models with unbalanced datasets may deteriorate the performance of global model aggregation.<n>We introduce an Optimal Transport-based preprocessing algorithm that aligns the datasets by minimizing the distributional discrepancy of data along the edge devices.
arXiv Detail & Related papers (2025-06-04T15:35:55Z) - Dataset Representativeness and Downstream Task Fairness [24.570493924073524]
We demonstrate that there is a natural tension between dataset representativeness and group-fairness of classifiers trained on that dataset.
We also find that over-sampling underrepresented groups can result in classifiers which exhibit greater bias to those groups.
arXiv Detail & Related papers (2024-06-28T18:11:16Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.