Related papers: Variable importance measure for spatial machine learning models with application to air pollution exposure prediction

Variable importance measure for spatial machine learning models with application to air pollution exposure prediction

URL: http://arxiv.org/abs/2406.01982v1
Date: Tue, 4 Jun 2024 05:51:36 GMT
Title: Variable importance measure for spatial machine learning models with application to air pollution exposure prediction
Authors: Si Cheng, Magali N. Blanco, Lianne Sheppard, Ali Shojaie, Adam Szpiro,
Abstract summary: The objective is to predict air pollution exposures for study subjects at locations without data in order to optimize our ability to learn about health effects of air pollution. We tackle these challenges in two datasets: sulfur (S) from regulatory United States national PM2.5 sub-species data and ultrafine particles (UFP) from a new Seattle-area traffic-related air pollution dataset. Our key contribution is a leave-one-out approach for variable importance that leads to interpretable and comparable measures for a broad class of models.
Score: 2.633085745593072
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Exposure assessment is fundamental to air pollution cohort studies. The objective is to predict air pollution exposures for study subjects at locations without data in order to optimize our ability to learn about health effects of air pollution. In addition to generating accurate predictions to minimize exposure measurement error, understanding the mechanism captured by the model is another crucial aspect that may not always be straightforward due to the complex nature of machine learning methods, as well as the lack of unifying notions of variable importance. This is further complicated in air pollution modeling by the presence of spatial correlation. We tackle these challenges in two datasets: sulfur (S) from regulatory United States national PM2.5 sub-species data and ultrafine particles (UFP) from a new Seattle-area traffic-related air pollution dataset. Our key contribution is a leave-one-out approach for variable importance that leads to interpretable and comparable measures for a broad class of models with separable mean and covariance components. We illustrate our approach with several spatial machine learning models, and it clearly highlights the difference in model mechanisms, even for those producing similar predictions. We leverage insights from this variable importance measure to assess the relative utilities of two exposure models for S and UFP that have similar out-of-sample prediction accuracies but appear to draw on different types of spatial information to make predictions.

Related papers

Uncertainty Quantification for Surface Ozone Emulators using Deep Learning [31.05745189965697]
As of 2023, 94% of the world's population is exposed to unsafe pollution levels.<n>Traditional physics-based models fall short in their practical use for scales relevant to human-health impacts.<n>We implement an uncertainty-aware U-Net architecture to predict the Multi-mOdel Multi-cOnstituent Chemical data assimilation model's surface ozone residuals.
arXiv Detail & Related papers (2025-08-06T21:22:06Z)
MVAR: MultiVariate AutoRegressive Air Pollutants Forecasting Model [18.785110680719235]
Existing studies predominantly focus on single-pollutant forecasting, neglecting the interactions among different pollutants and their diverse spatial responses.<n>We propose MultiVariate AutoRegressive air pollutants forecasting model, which reduces the dependency on long-time-window inputs.<n>We construct a comprehensive dataset covering 6 major pollutants across 75 cities in North China from 2018 to 2023, including ERA5 reanalysis data and FuXi-2.0 forecast data.
arXiv Detail & Related papers (2025-07-16T08:30:41Z)
FuXi-Air: Urban Air Quality Forecasting Based on Emission-Meteorology-Pollutant multimodal Machine Learning [22.270124698874934]
An air quality forecasting model, named FuXi-Air, has been constructed based on multimodal data fusion to support high-precision air quality forecasting.<n>The model successfully completes 72-hour forecasts for six major air pollutants at an hourly resolution across multiple monitoring sites within 25-30 seconds.
arXiv Detail & Related papers (2025-06-09T10:27:50Z)
Air Quality Prediction with A Meteorology-Guided Modality-Decoupled Spatio-Temporal Network [47.699409089023696]
Air quality prediction plays a crucial role in public health and environmental protection. Existing works underestimate the critical role atmospheric conditions in air quality prediction. MDSTNet is an encoder framework explicitly that captures atmosphere-pollution dependencies for prediction. ChinaAirNet is the first dataset combining air quality records with multi-pressure-level meteorological observations.
arXiv Detail & Related papers (2025-04-14T09:18:11Z)
A HEART for the environment: Transformer-Based Spatiotemporal Modeling for Air Quality Prediction [0.0]
llull-environment is a sophisticated and scalable forecasting system for air pollution. It contains an encoder-decoder convolutional neural network to forecast mean pollution levels for four key pollutants. This paper investigates the augmentation of this neural network with an attention mechanism to improve predictive accuracy.
arXiv Detail & Related papers (2025-02-26T10:54:27Z)
Stratospheric aerosol source inversion: Noise, variability, and uncertainty quantification [0.0]
This article presents a framework for stratospheric aerosol source inversion using a Bayesian approximation error approach. We leverage specially designed earth system model simulations using the Energy Exascale Earth System Model (E3SM) A comprehensive framework for data generation, data processing, dimension reduction, operator learning, and Bayesian inversion is presented.
arXiv Detail & Related papers (2024-09-10T20:12:36Z)
Machine Learning for Methane Detection and Quantification from Space -- A survey [49.7996292123687]
Methane (CH_4) is a potent anthropogenic greenhouse gas, contributing 86 times more to global warming than Carbon Dioxide (CO_2) over 20 years. This work expands existing information on operational methane point source detection sensors in the Short-Wave Infrared (SWIR) bands. It reviews the state-of-the-art for traditional as well as Machine Learning (ML) approaches.
arXiv Detail & Related papers (2024-08-27T15:03:20Z)
Cluster-Segregate-Perturb (CSP): A Model-agnostic Explainability Pipeline for Spatiotemporal Land Surface Forecasting Models [5.586191108738564]
This paper introduces a pipeline that integrates principles from both perturbation-based explainability techniques like LIME and global marginal explainability like PDP. The proposed pipeline simplifies the undertaking of diverse investigative analyses, such as marginal sensitivity analysis, marginal correlation analysis, lag analysis, etc., on complex land surface forecasting models.
arXiv Detail & Related papers (2024-08-12T04:29:54Z)
Urban Air Pollution Forecasting: a Machine Learning Approach leveraging Satellite Observations and Meteorological Forecasts [0.11249583407496218]
Air pollution poses a significant threat to public health and well-being, particularly in urban areas. This study introduces a series of machine-learning models that integrate data from the Sentinel-5P satellite, meteorological conditions, and topological characteristics to forecast future levels of five major pollutants.
arXiv Detail & Related papers (2024-05-30T10:02:53Z)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space. A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z)
Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method [66.80344502790231]
We extend meteorological downscaling to arbitrary scattered station scales and establish a new benchmark and dataset. Inspired by data assimilation techniques, we integrate observational data into the downscaling process, providing multi-scale observational priors. Our proposed method outperforms other specially designed baseline models on multiple surface variables.
arXiv Detail & Related papers (2024-01-22T14:02:56Z)
A Framework for Scalable Ambient Air Pollution Concentration Estimation [0.0]
Ambient air pollution remains a critical issue in the United Kingdom, where data on air pollution concentrations form the foundation for interventions aimed at improving air quality. We introduce a data-driven supervised machine learning model framework designed to address temporal and spatial data gaps by filling missing measurements. This approach provides a comprehensive dataset for England throughout 2018 at a 1kmx1km hourly resolution.
arXiv Detail & Related papers (2024-01-16T18:03:07Z)
Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling [58.456404022536425]
State of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs. Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative. The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
arXiv Detail & Related papers (2023-09-24T19:57:22Z)
Robust detection and attribution of climate change under interventions [4.344839102717429]
Fingerprints are key tools in climate change detection and attribution (D&A) We propose a direct D&A approach based on supervised learning to extract fingerprints that lead to robust predictions. Our study shows that incorporating robustness constraints against relevant interventions may significantly benefit detection and attribution of climate change.
arXiv Detail & Related papers (2022-12-09T15:13:40Z)
Spatial machine-learning model diagnostics: a model-agnostic distance-based approach [91.62936410696409]
This contribution proposes spatial prediction error profiles (SPEPs) and spatial variable importance profiles (SVIPs) as novel model-agnostic assessment and interpretation tools. The SPEPs and SVIPs of geostatistical methods, linear models, random forest, and hybrid algorithms show striking differences and also relevant similarities. The novel diagnostic tools enrich the toolkit of spatial data science, and may improve ML model interpretation, selection, and design.
arXiv Detail & Related papers (2021-11-13T01:50:36Z)
Efficient Causal Inference from Combined Observational and Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects. We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders. We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.