Comparative Analysis of Machine Learning-Based Imputation Techniques for Air Quality Datasets with High Missing Data Rates
- URL: http://arxiv.org/abs/2412.13966v2
- Date: Wed, 25 Dec 2024 13:39:20 GMT
- Title: Comparative Analysis of Machine Learning-Based Imputation Techniques for Air Quality Datasets with High Missing Data Rates
- Authors: Sen Yan, David J. O'Connor, Xiaojun Wang, Noel E. O'Connor, Alan F. Smeaton, Mingming Liu,
- Abstract summary: Urban pollution poses serious health risks, particularly in relation to traffic-related air pollution, which remains a major concern in many cities.
This study aims to provide insights for processing datasets vulnerable with high missing data rates.
Various imputation and prediction approaches were evaluated and compared, including ensemble methods, deep learning models, and diffusion models.
- Score: 11.458531729724191
- License:
- Abstract: Urban pollution poses serious health risks, particularly in relation to traffic-related air pollution, which remains a major concern in many cities. Vehicle emissions contribute to respiratory and cardiovascular issues, especially for vulnerable and exposed road users like pedestrians and cyclists. Therefore, accurate air quality monitoring with high spatial resolution is vital for good urban environmental management. This study aims to provide insights for processing spatiotemporal datasets with high missing data rates. In this study, the challenge of high missing data rates is a result of the limited data available and the fine granularity required for precise classification of PM2.5 levels. The data used for analysis and imputation were collected from both mobile sensors and fixed stations by Dynamic Parcel Distribution, the Environmental Protection Agency, and Google in Dublin, Ireland, where the missing data rate was approximately 82.42%, making accurate Particulate Matter 2.5 level predictions particularly difficult. Various imputation and prediction approaches were evaluated and compared, including ensemble methods, deep learning models, and diffusion models. External features such as traffic flow, weather conditions, and data from the nearest stations were incorporated to enhance model performance. The results indicate that diffusion methods with external features achieved the highest F1 score, reaching 0.9486 (Accuracy: 94.26%, Precision: 94.42%, Recall: 94.82%), with ensemble models achieving the highest accuracy of 94.82%, illustrating that good performance can be obtained despite a high missing data rate.
Related papers
- Enhancing PM2.5 Data Imputation and Prediction in Air Quality Monitoring Networks Using a KNN-SINDy Hybrid Model [0.0]
Air pollution, particularly particulate matter (PM2.5), poses significant risks to public health and the environment.
This study explores the application of Sparse Identification of Dynamics (SINDy2.5) for imputing missing PM2.5 data by predicting, using training data from 2016, and comparing its performance with the established Soft Impute (SI) and K-Nearest Neighbors (KNN) methods.
arXiv Detail & Related papers (2024-09-18T02:08:17Z) - A Framework for Scalable Ambient Air Pollution Concentration Estimation [0.0]
Ambient air pollution remains a critical issue in the United Kingdom, where data on air pollution concentrations form the foundation for interventions aimed at improving air quality.
We introduce a data-driven supervised machine learning model framework designed to address temporal and spatial data gaps by filling missing measurements.
This approach provides a comprehensive dataset for England throughout 2018 at a 1kmx1km hourly resolution.
arXiv Detail & Related papers (2024-01-16T18:03:07Z) - Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling [58.456404022536425]
State of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs.
Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative.
The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
arXiv Detail & Related papers (2023-09-24T19:57:22Z) - Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol
Particles for Frontier Exploration [55.41644538483948]
This paper introduces a multimodal dataset from the harsh and unstructured underground environment with aerosol particles.
It contains synchronized raw data measurements from all onboard sensors in Robot Operating System (ROS) format.
The focus of this paper is not only to capture both temporal and spatial data diversities but also to present the impact of harsh conditions on captured data.
arXiv Detail & Related papers (2023-04-27T20:21:18Z) - Robust Trajectory Prediction against Adversarial Attacks [84.10405251683713]
Trajectory prediction using deep neural networks (DNNs) is an essential component of autonomous driving systems.
These methods are vulnerable to adversarial attacks, leading to serious consequences such as collisions.
In this work, we identify two key ingredients to defend trajectory prediction models against adversarial attacks.
arXiv Detail & Related papers (2022-07-29T22:35:05Z) - Vision in adverse weather: Augmentation using CycleGANs with various
object detectors for robust perception in autonomous racing [70.16043883381677]
In autonomous racing, the weather can change abruptly, causing significant degradation in perception, resulting in ineffective manoeuvres.
In order to improve detection in adverse weather, deep-learning-based models typically require extensive datasets captured in such conditions.
We introduce an approach of using synthesised adverse condition datasets in autonomous racing (generated using CycleGAN) to improve the performance of four out of five state-of-the-art detectors.
arXiv Detail & Related papers (2022-01-10T10:02:40Z) - DeepAdversaries: Examining the Robustness of Deep Learning Models for
Galaxy Morphology Classification [47.38422424155742]
In morphological classification of galaxies, we study the effects of perturbations in imaging data.
We show that training with domain adaptation improves model robustness and mitigates the effects of these perturbations.
arXiv Detail & Related papers (2021-12-28T21:29:02Z) - Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of
Adverse Weather Conditions for 3D Object Detection [60.89616629421904]
Lidar-based object detectors are critical parts of the 3D perception pipeline in autonomous navigation systems such as self-driving cars.
They are sensitive to adverse weather conditions such as rain, snow and fog due to reduced signal-to-noise ratio (SNR) and signal-to-background ratio (SBR)
arXiv Detail & Related papers (2021-07-14T21:10:47Z) - Mining atmospheric data [0.0]
The first issue relates to the building new public datasets and benchmarks.
The second issue is the investigation of deep learning methodologies for atmospheric data classification.
The targeted application is air quality assessment and prediction.
arXiv Detail & Related papers (2021-06-26T10:04:35Z) - Improving Maritime Traffic Emission Estimations on Missing Data with
CRBMs [1.6311150636417262]
Maritime traffic emissions are a major concern to governments as they heavily impact the Air Quality in coastal cities.
State-of-the-art complex systems, like CALIOPE at the Barcelona Supercomputing Center, are used to model Air Quality.
We propose a methodology for treating ship data using Conditional Restricted Boltzmann Machines (CRBMs) plus machine learning methods.
arXiv Detail & Related papers (2020-09-07T10:32:43Z) - Analytical Equations based Prediction Approach for PM2.5 using
Artificial Neural Network [0.0]
Particulate Matter (PM2.5) is one of the important particulate pollutants to measure the Air Quality Index (AQI)
The conventional instruments used by the air quality monitoring stations to monitor PM2.5 are costly, bulkier, time-consuming, and power-hungry.
This article presents analytical equations based prediction approach for PM2.5 using an Artificial Neural Network (ANN)
arXiv Detail & Related papers (2020-02-26T11:39:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.