A comparison between geostatistical and machine learning models for spatio-temporal prediction of PM2.5 data
- URL: http://arxiv.org/abs/2509.12051v1
- Date: Mon, 15 Sep 2025 15:32:57 GMT
- Title: A comparison between geostatistical and machine learning models for spatio-temporal prediction of PM2.5 data
- Authors: Zeinab Mohamed, Wenlong Gong,
- Abstract summary: Exposure to high concentrations of PM2.5$ have been linked to increased respiratory and cardiovascular hospital admissions, more emergency department visits and deaths.<n>Traditional air quality monitoring systems provide limited spatial and temporal data.<n>The advent of low-cost sensors has dramatically improved the granularity of air quality data, enabling real-time, high-resolution monitoring.<n>This study exploits the extensive data from PurpleAir sensors to assess and compare the effectiveness of various statistical and machine learning models in producing accurate hourly PM$_2.5$ maps across California.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ambient air pollution poses significant health and environmental challenges. Exposure to high concentrations of PM$_{2.5}$ have been linked to increased respiratory and cardiovascular hospital admissions, more emergency department visits and deaths. Traditional air quality monitoring systems such as EPA-certified stations provide limited spatial and temporal data. The advent of low-cost sensors has dramatically improved the granularity of air quality data, enabling real-time, high-resolution monitoring. This study exploits the extensive data from PurpleAir sensors to assess and compare the effectiveness of various statistical and machine learning models in producing accurate hourly PM$_{2.5}$ maps across California. We evaluate traditional geostatistical methods, including kriging and land use regression, against advanced machine learning approaches such as neural networks, random forests, and support vector machines, as well as ensemble model. Our findings enhanced the predictive accuracy of PM2.5 concentration by correcting the bias in PurpleAir data with an ensemble model, which incorporating both spatiotemporal dependencies and machine learning models.
Related papers
- Synergistic Neural Forecasting of Air Pollution with Stochastic Sampling [50.3911487821783]
Air pollution remains a leading global health and environmental risk, particularly in regions vulnerable to episodic air pollution spikes due to wildfires, urban haze and dust storms.<n>Here, we present SynCast, a high-resolution neural forecasting model that integrates meteorological and air composition data to improve predictions of both average and extreme pollution levels.
arXiv Detail & Related papers (2025-10-28T01:18:00Z) - Air Quality PM2.5 Index Prediction Model Based on CNN-LSTM [0.2796197251957245]
We propose an air quality PM2.5 index prediction model based on a hybrid CNN-LSTM architecture.<n>The model effectively combines Convolutional Neural Networks (CNN) for local spatial feature extraction and Long Short-Term Memory (LSTM) networks for modeling temporal dependencies in time series data.<n> Experimental results show that the model achieves a root mean square error (RMSE) of 5.236, outperforming traditional time series models in both accuracy and generalization.
arXiv Detail & Related papers (2025-08-15T04:46:25Z) - Air Quality Prediction with A Meteorology-Guided Modality-Decoupled Spatio-Temporal Network [47.699409089023696]
Air quality prediction plays a crucial role in public health and environmental protection.<n>Existing works underestimate the critical role atmospheric conditions in air quality prediction.<n> MDSTNet is an encoder framework explicitly that captures atmosphere-pollution dependencies for prediction.<n>ChinaAirNet is the first dataset combining air quality records with multi-pressure-level meteorological observations.
arXiv Detail & Related papers (2025-04-14T09:18:11Z) - Enhancing PM2.5 Data Imputation and Prediction in Air Quality Monitoring Networks Using a KNN-SINDy Hybrid Model [0.0]
Air pollution, particularly particulate matter (PM2.5), poses significant risks to public health and the environment.
This study explores the application of Sparse Identification of Dynamics (SINDy2.5) for imputing missing PM2.5 data by predicting, using training data from 2016, and comparing its performance with the established Soft Impute (SI) and K-Nearest Neighbors (KNN) methods.
arXiv Detail & Related papers (2024-09-18T02:08:17Z) - Observation-Guided Meteorological Field Downscaling at Station Scale: A
Benchmark and a New Method [66.80344502790231]
We extend meteorological downscaling to arbitrary scattered station scales and establish a new benchmark and dataset.
Inspired by data assimilation techniques, we integrate observational data into the downscaling process, providing multi-scale observational priors.
Our proposed method outperforms other specially designed baseline models on multiple surface variables.
arXiv Detail & Related papers (2024-01-22T14:02:56Z) - Long-term drought prediction using deep neural networks based on geospatial weather data [75.38539438000072]
High-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance.
We tackle drought data by introducing an end-to-end approach that adopts a systematic end-to-end approach.
Key findings are the exceptional performance of a Transformer model, EarthFormer, in making accurate short-term (up to six months) forecasts.
arXiv Detail & Related papers (2023-09-12T13:28:06Z) - Unleashing Realistic Air Quality Forecasting: Introducing the
Ready-to-Use PurpleAirSF Dataset [4.190243190157989]
This paper introduces PurpleAirSF, a comprehensive and easily accessible dataset from the PurpleAir network.
We present a detailed account of the data collection and processing methods employed to build PurpleAirSF.
We conduct preliminary experiments using both classic and modern-temporal forecasting models, thereby establishing a benchmark for future air quality forecasting tasks.
arXiv Detail & Related papers (2023-06-24T12:10:16Z) - Detecting Elevated Air Pollution Levels by Monitoring Web Search
Queries: Deep Learning-Based Time Series Forecasting [7.978612711536259]
Prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecasting.
This study aims to develop and validate models to nowcast the observed pollution levels using Web search data, which is publicly available in near real-time from major search engines.
We developed novel machine learning-based models using both traditional supervised classification methods and state-of-the-art deep learning methods to detect elevated air pollution levels at the US city level.
arXiv Detail & Related papers (2022-11-09T23:56:35Z) - Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of
Adverse Weather Conditions for 3D Object Detection [60.89616629421904]
Lidar-based object detectors are critical parts of the 3D perception pipeline in autonomous navigation systems such as self-driving cars.
They are sensitive to adverse weather conditions such as rain, snow and fog due to reduced signal-to-noise ratio (SNR) and signal-to-background ratio (SBR)
arXiv Detail & Related papers (2021-07-14T21:10:47Z) - A Novel Hybrid Framework for Hourly PM2.5 Concentration Forecasting
Using CEEMDAN and Deep Temporal Convolutional Neural Network [2.2175470459999636]
This study develops a novel hybrid forecasting model based on complete ensemble empirical mode decomposition with adaptive noise.
The forecasting accuracy of the proposed CEEMDAN-DeepTCN model is verified to be the highest when compared with the time series model, artificial neural network, and the popular deep learning models.
The new model has improved the capability to model the PM2.5-related factor data patterns, and can be used as a promising tool for forecasting PM2.5 concentrations.
arXiv Detail & Related papers (2020-12-07T15:22:01Z) - Federated Learning in the Sky: Aerial-Ground Air Quality Sensing
Framework with UAV Swarms [53.38353133198842]
Air quality significantly affects human health, it is increasingly important to accurately and timely predict the Air Quality Index (AQI)
This paper proposes a new federated learning-based aerial-ground air quality sensing framework for fine-grained 3D air quality monitoring and forecasting.
For ground sensing systems, we propose a Graph Convolutional neural network-based Long Short-Term Memory (GC-LSTM) model to achieve accurate, real-time and future AQI inference.
arXiv Detail & Related papers (2020-07-23T13:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.