Variable importance measure for spatial machine learning models with application to air pollution exposure prediction
- URL: http://arxiv.org/abs/2406.01982v1
- Date: Tue, 4 Jun 2024 05:51:36 GMT
- Title: Variable importance measure for spatial machine learning models with application to air pollution exposure prediction
- Authors: Si Cheng, Magali N. Blanco, Lianne Sheppard, Ali Shojaie, Adam Szpiro,
- Abstract summary: The objective is to predict air pollution exposures for study subjects at locations without data in order to optimize our ability to learn about health effects of air pollution.
We tackle these challenges in two datasets: sulfur (S) from regulatory United States national PM2.5 sub-species data and ultrafine particles (UFP) from a new Seattle-area traffic-related air pollution dataset.
Our key contribution is a leave-one-out approach for variable importance that leads to interpretable and comparable measures for a broad class of models.
- Score: 2.633085745593072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exposure assessment is fundamental to air pollution cohort studies. The objective is to predict air pollution exposures for study subjects at locations without data in order to optimize our ability to learn about health effects of air pollution. In addition to generating accurate predictions to minimize exposure measurement error, understanding the mechanism captured by the model is another crucial aspect that may not always be straightforward due to the complex nature of machine learning methods, as well as the lack of unifying notions of variable importance. This is further complicated in air pollution modeling by the presence of spatial correlation. We tackle these challenges in two datasets: sulfur (S) from regulatory United States national PM2.5 sub-species data and ultrafine particles (UFP) from a new Seattle-area traffic-related air pollution dataset. Our key contribution is a leave-one-out approach for variable importance that leads to interpretable and comparable measures for a broad class of models with separable mean and covariance components. We illustrate our approach with several spatial machine learning models, and it clearly highlights the difference in model mechanisms, even for those producing similar predictions. We leverage insights from this variable importance measure to assess the relative utilities of two exposure models for S and UFP that have similar out-of-sample prediction accuracies but appear to draw on different types of spatial information to make predictions.
Related papers
- Urban Air Pollution Forecasting: a Machine Learning Approach leveraging Satellite Observations and Meteorological Forecasts [0.11249583407496218]
Air pollution poses a significant threat to public health and well-being, particularly in urban areas.
This study introduces a series of machine-learning models that integrate data from the Sentinel-5P satellite, meteorological conditions, and topological characteristics to forecast future levels of five major pollutants.
arXiv Detail & Related papers (2024-05-30T10:02:53Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Observation-Guided Meteorological Field Downscaling at Station Scale: A
Benchmark and a New Method [66.80344502790231]
We extend meteorological downscaling to arbitrary scattered station scales and establish a new benchmark and dataset.
Inspired by data assimilation techniques, we integrate observational data into the downscaling process, providing multi-scale observational priors.
Our proposed method outperforms other specially designed baseline models on multiple surface variables.
arXiv Detail & Related papers (2024-01-22T14:02:56Z) - A Framework for Scalable Ambient Air Pollution Concentration Estimation [0.0]
Ambient air pollution remains a critical issue in the United Kingdom, where data on air pollution concentrations form the foundation for interventions aimed at improving air quality.
We introduce a data-driven supervised machine learning model framework designed to address temporal and spatial data gaps by filling missing measurements.
This approach provides a comprehensive dataset for England throughout 2018 at a 1kmx1km hourly resolution.
arXiv Detail & Related papers (2024-01-16T18:03:07Z) - Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp)
We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings.
For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error.
In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z) - Residual Diffusion Modeling for Km-scale Atmospheric Downscaling [51.061954281398116]
A cost-effective downscaling model is trained from a high-resolution 2-km weather model over Taiwan.
textitCorrDiff exhibits skillful RMSE and CRPS and faithfully recovers spectra and distributions even for extremes.
Downscaling global forecasts successfully retains many of these benefits, foreshadowing the potential of end-to-end, global-to-km-scales machine learning weather predictions.
arXiv Detail & Related papers (2023-09-24T19:57:22Z) - A machine learning and feature engineering approach for the prediction
of the uncontrolled re-entry of space objects [1.0205541448656992]
We present the development of a deep learning model for the re-entry prediction of uncontrolled objects in Low Earth Orbit (LEO)
The model is based on a modified version of the Sequence-to-Sequence architecture and is trained on the average altitude profile as derived from a set of Two-Line Element (TLE) data of over 400 bodies.
The novelty of the work consists in introducing in the deep learning model, alongside the average altitude, three new input features: a drag-like coefficient (B*), the average solar index, and the area-to-mass ratio of the object.
arXiv Detail & Related papers (2023-03-17T13:53:59Z) - Robust detection and attribution of climate change under interventions [4.344839102717429]
Fingerprints are key tools in climate change detection and attribution (D&A)
We propose a direct D&A approach based on supervised learning to extract fingerprints that lead to robust predictions.
Our study shows that incorporating robustness constraints against relevant interventions may significantly benefit detection and attribution of climate change.
arXiv Detail & Related papers (2022-12-09T15:13:40Z) - Reduced-order modeling for parameterized large-eddy simulations of
atmospheric pollutant dispersion [0.0]
Large-eddy simulations (LES) have the potential to accurately represent pollutant concentration spatial variability.
LES become prohibitively costly to deploy to understand how plume flow and tracer dispersion change with various atmospheric and source parameters.
We propose a non-intrusive reduced-order model combining proper decomposition (POD) and Gaussian process regression (GPR) to predict LES field statistics of interest associated with tracer concentrations.
arXiv Detail & Related papers (2022-08-02T15:06:22Z) - Spatial machine-learning model diagnostics: a model-agnostic
distance-based approach [91.62936410696409]
This contribution proposes spatial prediction error profiles (SPEPs) and spatial variable importance profiles (SVIPs) as novel model-agnostic assessment and interpretation tools.
The SPEPs and SVIPs of geostatistical methods, linear models, random forest, and hybrid algorithms show striking differences and also relevant similarities.
The novel diagnostic tools enrich the toolkit of spatial data science, and may improve ML model interpretation, selection, and design.
arXiv Detail & Related papers (2021-11-13T01:50:36Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.