Variable importance measure for spatial machine learning models with application to air pollution exposure prediction
- URL: http://arxiv.org/abs/2406.01982v1
- Date: Tue, 4 Jun 2024 05:51:36 GMT
- Title: Variable importance measure for spatial machine learning models with application to air pollution exposure prediction
- Authors: Si Cheng, Magali N. Blanco, Lianne Sheppard, Ali Shojaie, Adam Szpiro,
- Abstract summary: The objective is to predict air pollution exposures for study subjects at locations without data in order to optimize our ability to learn about health effects of air pollution.
We tackle these challenges in two datasets: sulfur (S) from regulatory United States national PM2.5 sub-species data and ultrafine particles (UFP) from a new Seattle-area traffic-related air pollution dataset.
Our key contribution is a leave-one-out approach for variable importance that leads to interpretable and comparable measures for a broad class of models.
- Score: 2.633085745593072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exposure assessment is fundamental to air pollution cohort studies. The objective is to predict air pollution exposures for study subjects at locations without data in order to optimize our ability to learn about health effects of air pollution. In addition to generating accurate predictions to minimize exposure measurement error, understanding the mechanism captured by the model is another crucial aspect that may not always be straightforward due to the complex nature of machine learning methods, as well as the lack of unifying notions of variable importance. This is further complicated in air pollution modeling by the presence of spatial correlation. We tackle these challenges in two datasets: sulfur (S) from regulatory United States national PM2.5 sub-species data and ultrafine particles (UFP) from a new Seattle-area traffic-related air pollution dataset. Our key contribution is a leave-one-out approach for variable importance that leads to interpretable and comparable measures for a broad class of models with separable mean and covariance components. We illustrate our approach with several spatial machine learning models, and it clearly highlights the difference in model mechanisms, even for those producing similar predictions. We leverage insights from this variable importance measure to assess the relative utilities of two exposure models for S and UFP that have similar out-of-sample prediction accuracies but appear to draw on different types of spatial information to make predictions.
Related papers
- Stratospheric aerosol source inversion: Noise, variability, and uncertainty quantification [0.0]
This article presents a framework for stratospheric aerosol source inversion using a Bayesian approximation error approach.
We leverage specially designed earth system model simulations using the Energy Exascale Earth System Model (E3SM)
A comprehensive framework for data generation, data processing, dimension reduction, operator learning, and Bayesian inversion is presented.
arXiv Detail & Related papers (2024-09-10T20:12:36Z) - Machine Learning for Methane Detection and Quantification from Space -- A survey [49.7996292123687]
Methane (CH_4) is a potent anthropogenic greenhouse gas, contributing 86 times more to global warming than Carbon Dioxide (CO_2) over 20 years.
This work expands existing information on operational methane point source detection sensors in the Short-Wave Infrared (SWIR) bands.
It reviews the state-of-the-art for traditional as well as Machine Learning (ML) approaches.
arXiv Detail & Related papers (2024-08-27T15:03:20Z) - Cluster-Segregate-Perturb (CSP): A Model-agnostic Explainability Pipeline for Spatiotemporal Land Surface Forecasting Models [5.586191108738564]
This paper introduces a pipeline that integrates principles from both perturbation-based explainability techniques like LIME and global marginal explainability like PDP.
The proposed pipeline simplifies the undertaking of diverse investigative analyses, such as marginal sensitivity analysis, marginal correlation analysis, lag analysis, etc., on complex land surface forecasting models.
arXiv Detail & Related papers (2024-08-12T04:29:54Z) - Urban Air Pollution Forecasting: a Machine Learning Approach leveraging Satellite Observations and Meteorological Forecasts [0.11249583407496218]
Air pollution poses a significant threat to public health and well-being, particularly in urban areas.
This study introduces a series of machine-learning models that integrate data from the Sentinel-5P satellite, meteorological conditions, and topological characteristics to forecast future levels of five major pollutants.
arXiv Detail & Related papers (2024-05-30T10:02:53Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Observation-Guided Meteorological Field Downscaling at Station Scale: A
Benchmark and a New Method [66.80344502790231]
We extend meteorological downscaling to arbitrary scattered station scales and establish a new benchmark and dataset.
Inspired by data assimilation techniques, we integrate observational data into the downscaling process, providing multi-scale observational priors.
Our proposed method outperforms other specially designed baseline models on multiple surface variables.
arXiv Detail & Related papers (2024-01-22T14:02:56Z) - A Framework for Scalable Ambient Air Pollution Concentration Estimation [0.0]
Ambient air pollution remains a critical issue in the United Kingdom, where data on air pollution concentrations form the foundation for interventions aimed at improving air quality.
We introduce a data-driven supervised machine learning model framework designed to address temporal and spatial data gaps by filling missing measurements.
This approach provides a comprehensive dataset for England throughout 2018 at a 1kmx1km hourly resolution.
arXiv Detail & Related papers (2024-01-16T18:03:07Z) - Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling [58.456404022536425]
State of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs.
Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative.
The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
arXiv Detail & Related papers (2023-09-24T19:57:22Z) - Robust detection and attribution of climate change under interventions [4.344839102717429]
Fingerprints are key tools in climate change detection and attribution (D&A)
We propose a direct D&A approach based on supervised learning to extract fingerprints that lead to robust predictions.
Our study shows that incorporating robustness constraints against relevant interventions may significantly benefit detection and attribution of climate change.
arXiv Detail & Related papers (2022-12-09T15:13:40Z) - Spatial machine-learning model diagnostics: a model-agnostic
distance-based approach [91.62936410696409]
This contribution proposes spatial prediction error profiles (SPEPs) and spatial variable importance profiles (SVIPs) as novel model-agnostic assessment and interpretation tools.
The SPEPs and SVIPs of geostatistical methods, linear models, random forest, and hybrid algorithms show striking differences and also relevant similarities.
The novel diagnostic tools enrich the toolkit of spatial data science, and may improve ML model interpretation, selection, and design.
arXiv Detail & Related papers (2021-11-13T01:50:36Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.