Impact on Public Health Decision Making by Utilizing Big Data Without
Domain Knowledge
- URL: http://arxiv.org/abs/2402.06059v1
- Date: Thu, 8 Feb 2024 21:03:34 GMT
- Title: Impact on Public Health Decision Making by Utilizing Big Data Without
Domain Knowledge
- Authors: Miao Zhang, Salman Rahman, Vishwali Mhasawade, Rumi Chunara
- Abstract summary: New data sources, and artificial intelligence (AI) methods are becoming plentiful, and relevant to decision making in many societal applications.
This work illustrates important issues of robustness and model specification for informing effective allocation of interventions using new data sources.
- Score: 17.73578632982445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: New data sources, and artificial intelligence (AI) methods to extract
information from them are becoming plentiful, and relevant to decision making
in many societal applications. An important example is street view imagery,
available in over 100 countries, and considered for applications such as
assessing built environment aspects in relation to community health outcomes.
Relevant to such uses, important examples of bias in the use of AI are evident
when decision-making based on data fails to account for the robustness of the
data, or predictions are based on spurious correlations. To study this risk, we
utilize 2.02 million GSV images along with health, demographic, and
socioeconomic data from New York City. Initially, we demonstrate that built
environment characteristics inferred from GSV labels at the intra-city level
may exhibit inadequate alignment with the ground truth. We also find that the
average individual-level behavior of physical inactivity significantly mediates
the impact of built environment features by census tract, as measured through
GSV. Finally, using a causal framework which accounts for these mediators of
environmental impacts on health, we find that altering 10% of samples in the
two lowest tertiles would result in a 4.17 (95% CI 3.84 to 4.55) or 17.2 (95%
CI 14.4 to 21.3) times bigger decrease on the prevalence of obesity or
diabetes, than the same proportional intervention on the number of crosswalks
by census tract. This work illustrates important issues of robustness and model
specification for informing effective allocation of interventions using new
data sources.
Related papers
- Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - The adoption of non-pharmaceutical interventions and the role of digital
infrastructure during the COVID-19 Pandemic in Colombia, Ecuador, and El
Salvador [0.0]
We study the determinants of NPIs adherence during the first wave of the COVID-19 Pandemic in Colombia, Ecuador, and El Salvador.
We find a significant correlation between mobility drops and digital infrastructure quality.
The link between mobility drops and digital infrastructure quality is stronger at the peak of NPIs stringency.
arXiv Detail & Related papers (2022-02-24T13:15:17Z) - Label scarcity in biomedicine: Data-rich latent factor discovery
enhances phenotype prediction [102.23901690661916]
Low-dimensional embedding spaces can be derived from the UK Biobank population dataset to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics.
Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.
arXiv Detail & Related papers (2021-10-12T16:25:50Z) - The Effects of Air Quality on the Spread of the COVID-19. An Artificial
Intelligence Approach [3.997680012976965]
The aim of this work is to investigate any possible relationships between air quality and confirmed cases of COVID-19 in Italian districts.
We report an analysis of the correlation between daily COVID-19 cases and environmental factors, such as temperature, relative humidity, and atmospheric pollutants.
This suggests that machine learning models trained on the environmental parameters to predict the number of future infected cases may be accurate.
arXiv Detail & Related papers (2021-04-09T19:08:59Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Exposure Density and Neighborhood Disparities in COVID-19 Infection
Risk: Using Large-scale Geolocation Data to Understand Burdens on Vulnerable
Communities [1.2526963688768453]
This study develops a new method to quantify neighborhood activity levels at high spatial and temporal resolutions.
We define exposure density as a measure of both the localized volume of activity in a defined area and the proportion of activity occurring in non-residential and outdoor land uses.
arXiv Detail & Related papers (2020-08-04T15:41:24Z) - COVI White Paper [67.04578448931741]
Contact tracing is an essential tool to change the course of the Covid-19 pandemic.
We present an overview of the rationale, design, ethical considerations and privacy strategy of COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
arXiv Detail & Related papers (2020-05-18T07:40:49Z) - Inferring the Spatial Distribution of Physical Activity in Children
Population from Characteristics of the Environment [5.16880858963093]
We propose a novel analysis approach for modeling the expected population behavior as a function of the local environment.
We experimentally evaluate this approach in predicting the expected physical activity level in small geographic regions.
Specifically, we train models that predict the physical activity level in a region, achieving 81% leave-one-out accuracy.
arXiv Detail & Related papers (2020-05-08T11:07:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.