Examining Data Imbalance in Crowdsourced Reports for Improving Flash
Flood Situational Awareness
- URL: http://arxiv.org/abs/2207.05797v1
- Date: Tue, 12 Jul 2022 19:26:43 GMT
- Title: Examining Data Imbalance in Crowdsourced Reports for Improving Flash
Flood Situational Awareness
- Authors: Miguel Esparza, Hamed Farahmand, Samuel Brody, Ali Mostafavi
- Abstract summary: We analyzed reported flooding from 3-1-1, Waze reports, and FEMA damage data collected in the aftermaths of Tropical Storm Imelda in 2019 and Hurricane Ida in 2021.
By looking at two geographical aggregations, we found that the larger spatial aggregations, census tracts, show less data imbalance in the results.
We found that 3-1-1 and Waze reports have data imbalance limitations in areas where minority populations reside.
- Score: 0.965964228590342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of crowdsourced data has been finding practical use for enhancing
situational awareness during disasters. While recent studies have shown
promising results regarding the potential of crowdsourced data for flood
mapping, little attention has been paid to data imbalances issues that could
introduce biases. We examine biases present in crowdsourced reports to identify
data imbalances with a goal of improving disaster situational awareness. Sample
bias, spatial bias, and demographic bias are examined as we analyzed reported
flooding from 3-1-1, Waze reports, and FEMA damage data collected in the
aftermaths of Tropical Storm Imelda in 2019 and Hurricane Ida in 2021.
Integrating other flooding related topics from 3-1-1 reports into the Global
Moran's I and Local Indicator of Spatial Association (LISA) test revealed more
communities that were impacted by floods. To examine spatial bias, we perform
the LISA and BI-LISA tests on the three datasets at the census tract and census
block group level. By looking at two geographical aggregations, we found that
the larger spatial aggregations, census tracts, show less data imbalance in the
results. Finally, one-way analysis of Variance (ANOVA) test performed on the
clusters generated from the BI-LISA shows that data imbalance exists in areas
where minority populations reside. Through a regression analysis, we found that
3-1-1 and Waze reports have data imbalance limitations in areas where minority
populations reside. The findings of this study advance understanding of data
imbalances and biases in crowdsourced datasets that are growingly used for
disaster situational awareness.
Related papers
- BiasBuster: a Neural Approach for Accurate Estimation of Population
Statistics using Biased Location Data [6.077198822448429]
We show that statistical debiasing, although in some cases useful, often fails to improve accuracy.
We then propose BiasBuster, a neural network approach that utilizes the correlations between population statistics and location characteristics to provide accurate estimates of population statistics.
arXiv Detail & Related papers (2024-02-17T16:16:24Z) - A Bayesian Spatial Model to Correct Under-Reporting in Urban
Crowdsourcing [1.850972250657274]
Decision-makers often observe the occurrence of events through a reporting process.
We show how to overcome this challenge by leveraging the fact that events are spatially correlated.
arXiv Detail & Related papers (2023-12-18T23:40:56Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Imbalanced Aircraft Data Anomaly Detection [103.01418862972564]
Anomaly detection in temporal data from sensors under aviation scenarios is a practical but challenging task.
We propose a Graphical Temporal Data Analysis framework.
It consists three modules, named Series-to-Image (S2I), Cluster-based Resampling Approach using Euclidean Distance (CRD) and Variance-Based Loss (VBL)
arXiv Detail & Related papers (2023-05-17T09:37:07Z) - Imputation of Missing Streamflow Data at Multiple Gauging Stations in
Benin Republic [1.9173188470245428]
This work reconstructs streamflow time series data through bias correction of the GEOGloWS ECMWF streamflow service forecasts.
We show by simulating missingness in a testing period that GESS forecasts have a significant bias that results in low predictive skill over the ten Beninese stations.
The findings of this work provide a basis for integrating global GESS streamflow data into operational early-warning decision-making systems.
arXiv Detail & Related papers (2022-11-17T22:44:13Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced
Dataset and Benchmark [62.997667081978825]
The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident.
The dataset is created by aggregating publicly available datasets from the UK Department for Transport.
arXiv Detail & Related papers (2022-05-20T21:15:26Z) - Jalisco's multiclass land cover analysis and classification using a
novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis.
In this work, we combine three real-world open data sources to obtain 13 channels.
Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z) - Equitable Community Resilience: The Case of Winter Storm Uri in Texas [0.0]
This research investigated aspects of equity related to community resilience in the aftermath of Winter Storm Uri in Texas.
Satellite imagery was used to examine data at a much higher geographical resolution focusing on census tracts in the city of Houston.
Results revealed statistically significant negative associations between counties' percentage of non-Hispanic whites and median household income with the ratio of outages.
arXiv Detail & Related papers (2022-01-17T22:54:07Z) - Leveraging Administrative Data for Bias Audits: Assessing Disparate
Coverage with Mobility Data for COVID-19 Policy [61.60099467888073]
We show how linking administrative data can enable auditing mobility data for bias.
We show that older and non-white voters are less likely to be captured by mobility data.
We show that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.
arXiv Detail & Related papers (2020-11-14T02:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.