A Bayesian Spatial Model to Correct Under-Reporting in Urban
Crowdsourcing
- URL: http://arxiv.org/abs/2312.11754v1
- Date: Mon, 18 Dec 2023 23:40:56 GMT
- Title: A Bayesian Spatial Model to Correct Under-Reporting in Urban
Crowdsourcing
- Authors: Gabriel Agostini, Emma Pierson, Nikhil Garg
- Abstract summary: Decision-makers often observe the occurrence of events through a reporting process.
We show how to overcome this challenge by leveraging the fact that events are spatially correlated.
- Score: 1.850972250657274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decision-makers often observe the occurrence of events through a reporting
process. City governments, for example, rely on resident reports to find and
then resolve urban infrastructural problems such as fallen street trees,
flooded basements, or rat infestations. Without additional assumptions, there
is no way to distinguish events that occur but are not reported from events
that truly did not occur--a fundamental problem in settings with
positive-unlabeled data. Because disparities in reporting rates correlate with
resident demographics, addressing incidents only on the basis of reports leads
to systematic neglect in neighborhoods that are less likely to report events.
We show how to overcome this challenge by leveraging the fact that events are
spatially correlated. Our framework uses a Bayesian spatial latent variable
model to infer event occurrence probabilities and applies it to storm-induced
flooding reports in New York City, further pooling results across multiple
storms. We show that a model accounting for under-reporting and spatial
correlation predicts future reports more accurately than other models, and
further induces a more equitable set of inspections: its allocations better
reflect the population and provide equitable service to non-white, less
traditionally educated, and lower-income residents. This finding reflects
heterogeneous reporting behavior learned by the model: reporting rates are
higher in Census tracts with higher populations, proportions of white
residents, and proportions of owner-occupied households. Our work lays the
groundwork for more equitable proactive government services, even with
disparate reporting behavior.
Related papers
- BiasBuster: a Neural Approach for Accurate Estimation of Population
Statistics using Biased Location Data [6.077198822448429]
We show that statistical debiasing, although in some cases useful, often fails to improve accuracy.
We then propose BiasBuster, a neural network approach that utilizes the correlations between population statistics and location characteristics to provide accurate estimates of population statistics.
arXiv Detail & Related papers (2024-02-17T16:16:24Z) - The Impact of Differential Feature Under-reporting on Algorithmic Fairness [86.275300739926]
We present an analytically tractable model of differential feature under-reporting.
We then use to characterize the impact of this kind of data bias on algorithmic fairness.
Our results show that, in real world data settings, under-reporting typically leads to increasing disparities.
arXiv Detail & Related papers (2024-01-16T19:16:22Z) - An Ordinal Latent Variable Model of Conflict Intensity [59.49424978353101]
The Goldstein scale is a widely-used expert-based measure that scores events on a conflictual-cooperative scale.
This paper takes a latent variable-based approach to measuring conflict intensity.
arXiv Detail & Related papers (2022-10-08T08:59:17Z) - Examining Data Imbalance in Crowdsourced Reports for Improving Flash
Flood Situational Awareness [0.965964228590342]
We analyzed reported flooding from 3-1-1, Waze reports, and FEMA damage data collected in the aftermaths of Tropical Storm Imelda in 2019 and Hurricane Ida in 2021.
By looking at two geographical aggregations, we found that the larger spatial aggregations, census tracts, show less data imbalance in the results.
We found that 3-1-1 and Waze reports have data imbalance limitations in areas where minority populations reside.
arXiv Detail & Related papers (2022-07-12T19:26:43Z) - Quantifying Spatial Under-reporting Disparities in Resident
Crowdsourcing [5.701305404173138]
We develop a method to identify reporting delays without using external ground-truth data.
We apply our method to over 100,000 resident reports made in New York City and to over 900,000 reports made in Chicago.
arXiv Detail & Related papers (2022-04-19T02:54:16Z) - The World of an Octopus: How Reporting Bias Influences a Language
Model's Perception of Color [73.70233477125781]
We show that reporting bias negatively impacts and inherently limits text-only training.
We then demonstrate that multimodal models can leverage their visual training to mitigate these effects.
arXiv Detail & Related papers (2021-10-15T16:28:17Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Priority prediction of Asian Hornet sighting report using machine
learning methods [0.0]
The Asian giant hornet (Vespa mandarinia) is devastating not only to native bee colonies, but also to local apiculture.
We propose a method to predict the priority of sighting reports based on machine learning.
arXiv Detail & Related papers (2021-06-28T07:33:53Z) - Emergency Incident Detection from Crowdsourced Waze Data using Bayesian
Information Fusion [4.039649741925056]
This paper presents a novel method for emergency incident detection using noisy crowdsourced Waze data.
We propose a principled computational framework based on observational theory to model the uncertainty in the reliability of crowd-generated reports.
arXiv Detail & Related papers (2020-11-10T22:45:03Z) - On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data.
We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations.
We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z) - Showing Your Work Doesn't Always Work [73.63200097493576]
"Show Your Work: Improved Reporting of Experimental Results" advocates for reporting the expected validation effectiveness of the best-tuned model.
We analytically show that their estimator is biased and uses error-prone assumptions.
We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation.
arXiv Detail & Related papers (2020-04-28T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.