How Your Location Relates to Health: Variable Importance and Interpretable Machine Learning for Environmental and Sociodemographic Data
- URL: http://arxiv.org/abs/2501.02111v1
- Date: Fri, 03 Jan 2025 21:34:35 GMT
- Title: How Your Location Relates to Health: Variable Importance and Interpretable Machine Learning for Environmental and Sociodemographic Data
- Authors: Ishaan Maitra, Raymond Lin, Eric Chen, Jon Donnelly, Sanja Šćepanović, Cynthia Rudin,
- Abstract summary: Health outcomes depend on complex environmental and sociodemographic factors whose effects change over location and time.
We use fine-grained spatial and temporal data to study these effects, namely the MEDSAT dataset of English health, environmental, and sociodemographic information.
We then develop an interpretable machine learning framework based on Generalized Additive Models (GAMs) and Multiscale Geographically Weighted Regression (MGWR)
Our findings identify NO2 as a global predictor for asthma, hypertension, and anxiety, alongside other outcome-specific predictors related to occupation, marriage, and vegetation.
- Score: 15.463748602675695
- License:
- Abstract: Health outcomes depend on complex environmental and sociodemographic factors whose effects change over location and time. Only recently has fine-grained spatial and temporal data become available to study these effects, namely the MEDSAT dataset of English health, environmental, and sociodemographic information. Leveraging this new resource, we use a variety of variable importance techniques to robustly identify the most informative predictors across multiple health outcomes. We then develop an interpretable machine learning framework based on Generalized Additive Models (GAMs) and Multiscale Geographically Weighted Regression (MGWR) to analyze both local and global spatial dependencies of each variable on various health outcomes. Our findings identify NO2 as a global predictor for asthma, hypertension, and anxiety, alongside other outcome-specific predictors related to occupation, marriage, and vegetation. Regional analyses reveal local variations with air pollution and solar radiation, with notable shifts during COVID. This comprehensive approach provides actionable insights for addressing health disparities, and advocates for the integration of interpretable machine learning in public health.
Related papers
- Combining Observational Data and Language for Species Range Estimation [63.65684199946094]
We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.
Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.
Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
arXiv Detail & Related papers (2024-10-14T17:22:55Z) - Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments [67.80453452949303]
Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine.
Here, we focus on the widespread setting where the observational data come from multiple environments.
We propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models.
arXiv Detail & Related papers (2024-06-04T16:31:43Z) - On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks [3.9058850780464884]
Federated Learning (FL) allows privacy-sensitive applications to leverage their dataset for a global model construction without any disclosure of the information.
One of those domains is healthcare, where groups of silos collaborate in order to generate a global predictor with improved accuracy and generalization.
This paper presents a comprehensive exploration of the mathematical formalization and taxonomy of heterogeneity within FL environments, focusing on the intricacies of medical data.
arXiv Detail & Related papers (2024-04-29T09:05:01Z) - Using Geographic Location-based Public Health Features in Survival
Analysis [12.424517746493553]
This paper proposes a complementary improvement to survival analysis models by incorporating public health statistics in the input features.
We show that including geographic location-based public health information results in a statistically significant improvement in the concordance index evaluated on the Surveillance, Epidemiology, and End Results (SEER) dataset.
Our results indicate the utility of geographic location-based public health features in survival analysis.
arXiv Detail & Related papers (2023-04-16T03:15:00Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - The Medkit-Learn(ing) Environment: Medical Decision Modelling through
Simulation [81.72197368690031]
We present a new benchmarking suite designed specifically for medical sequential decision making.
The Medkit-Learn(ing) Environment is a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data.
arXiv Detail & Related papers (2021-06-08T10:38:09Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Health Status Prediction with Local-Global Heterogeneous Behavior Graph [69.99431339130105]
Estimation of health status can be achieved with various kinds of data streams continuously collected from wearable sensors.
We propose to model the behavior-related multi-source data streams with a local-global graph.
We take experiments on StudentLife dataset, and extensive results demonstrate the effectiveness of our proposed model.
arXiv Detail & Related papers (2021-03-23T11:10:04Z) - DigitalExposome: Quantifying the Urban Environment Influence on
Wellbeing based on Real-Time Multi-Sensor Fusion and Deep Belief Network [4.340040784481499]
We define the term 'DigitalExposome' as a conceptual framework that takes us closer to understanding the relationship between environment, personal characteristics, behaviour and wellbeing.
We simultaneously collected (for the first time) multisensor data including urban environmental factors.
arXiv Detail & Related papers (2021-01-29T14:55:19Z) - A Deep Learning Pipeline for Patient Diagnosis Prediction Using
Electronic Health Records [0.5672132510411464]
We develop and publish a Python package to transform public health dataset into easy to access universal format.
We propose two novel model architectures to predict multiple diagnoses simultaneously.
Both models can predict multiple diagnoses simultaneously with high accuracy.
arXiv Detail & Related papers (2020-06-23T14:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.