Using remotely sensed data for air pollution assessment
- URL: http://arxiv.org/abs/2402.06653v1
- Date: Sun, 4 Feb 2024 14:27:28 GMT
- Title: Using remotely sensed data for air pollution assessment
- Authors: Teresa Bernardino, Maria Alexandra Oliveira, Jo\~ao Nuno Silva
- Abstract summary: The main objective of this work is to create models capable of inferring pollutant concentrations in locations where no observation data exists.
A machine learning model was developed for predicting concentrations in the Iberian Peninsula in 2019 for five selected pollutants.
All models presented acceptable cross-validation RMSE, except the $O_3$ and $PM10$ models where the mean value was a little higher.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Air pollution constitutes a global problem of paramount importance that
affects not only human health, but also the environment. The existence of
spatial and temporal data regarding the concentrations of pollutants is crucial
for performing air pollution studies and monitor emissions. However, although
observation data presents great temporal coverage, the number of stations is
very limited and they are usually built in more populated areas.
The main objective of this work is to create models capable of inferring
pollutant concentrations in locations where no observation data exists. A
machine learning model, more specifically the random forest model, was
developed for predicting concentrations in the Iberian Peninsula in 2019 for
five selected pollutants: $NO_2$, $O_3$ $SO_2$, $PM10$, and $PM2.5$. Model
features include satellite measurements, meteorological variables, land use
classification, temporal variables (month, day of year), and spatial variables
(latitude, longitude, altitude).
The models were evaluated using various methods, including station 10-fold
cross-validation, in which in each fold observations from 10\% of the stations
are used as testing data and the rest as training data. The $R^2$, RMSE and
mean bias were determined for each model. The $NO_2$ and $O_3$ models presented
good values of $R^2$, 0.5524 and 0.7462, respectively. However, the $SO_2$,
$PM10$, and $PM2.5$ models performed very poorly in this regard, with $R^2$
values of -0.0231, 0.3722, and 0.3303, respectively. All models slightly
overestimated the ground concentrations, except the $O_3$ model. All models
presented acceptable cross-validation RMSE, except the $O_3$ and $PM10$ models
where the mean value was a little higher (12.5934 $\mu g/m^3$ and 10.4737 $\mu
g/m^3$, respectively).
Related papers
- Language models scale reliably with over-training and on downstream tasks [121.69867718185125]
Scaling laws are useful guides for derisking expensive training runs.
However, there remain gaps between current studies and how language models are trained.
In contrast, scaling laws mostly predict loss on inference, but models are usually compared on downstream task performance.
arXiv Detail & Related papers (2024-03-13T13:54:00Z) - A Data-Driven Supervised Machine Learning Approach to Estimating Global
Ambient Air Pollution Concentrations With Associated Prediction Intervals [0.0]
We have developed a scalable, data-driven, supervised machine learning framework to impute missing temporal and spatial measurements.
This model is designed to impute missing temporal and spatial measurements, thereby generating a comprehensive dataset for pollutants including NO$, O$_3$, PM$_10$, PM$_2.5$, and SO$.
The model's performance across various geographical locations is examined, providing insights and recommendations for strategic placement of future monitoring stations.
arXiv Detail & Related papers (2024-02-15T11:09:22Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling [58.456404022536425]
State of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs.
Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative.
The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
arXiv Detail & Related papers (2023-09-24T19:57:22Z) - A comparative study of statistical and machine learning models on
near-real-time daily emissions prediction [0.0]
The rapid ascent in carbon dioxide emissions is a major cause of global warming and climate change.
This paper aims to select a suitable model to predict the near-real-time daily emissions from January 1st, 2020 to September 30st, 2022 of all sectors in China.
arXiv Detail & Related papers (2023-02-02T15:14:27Z) - Predicting air quality via multimodal AI and satellite imagery [0.2492060267829796]
This paper seeks to create a multi-modal machine learning model for predicting air-quality metrics where monitoring stations do not exist.
A new dataset of European pollution monitoring station measurements is created with features including $textitaltitude, population, etc.$ from the ESA Copernicus project.
These predictions are then aggregated to create an "air-quality index" that could be used to compare air quality over different regions.
arXiv Detail & Related papers (2022-11-01T22:56:15Z) - A Relational Intervention Approach for Unsupervised Dynamics
Generalization in Model-Based Reinforcement Learning [113.75991721607174]
We introduce an interventional prediction module to estimate the probability of two estimated $hatz_i, hatz_j$ belonging to the same environment.
We empirically show that $hatZ$ estimated by our method enjoy less redundant information than previous methods.
arXiv Detail & Related papers (2022-06-09T15:01:36Z) - High-resolution landscape-scale biomass mapping using a spatiotemporal
patchwork of LiDAR coverages [0.0]
Estimating forest aboveground biomass at fine scales has become increasingly important for greenhouse gas estimation.
Here we address common obstacles including selection of training data, the investigation of regional or coverage specific bias and error, and map patterns at multiple scales.
Our model was overall accurate (% RMSE 13-33%), had very low bias (MBE $leq$ $pm$5 Mg ha$-1$), explained most field-observed variation.
arXiv Detail & Related papers (2022-05-17T17:53:50Z) - Datamodels: Predicting Predictions from Training Data [86.66720175866415]
We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data.
We show that even simple linear datamodels can successfully predict model outputs.
arXiv Detail & Related papers (2022-02-01T18:15:24Z) - Prediction of daily maximum ozone levels using Lasso sparse modeling
method [0.0]
This paper applies modern statistical methods in the prediction of the next-day maximum ozone concentration.
The model uses a large number of candidate features, including the present day's hourly concentration level of various pollutants, as well as the meteorological variables.
The model trained by 3-years data demonstrates relatively good prediction accuracy, with RMSE= 5.63 ppb, MAE= 4.42 ppb, and RMSE= 5.68 ppb, MAE= 4.52 ppb.
arXiv Detail & Related papers (2020-10-18T02:58:53Z) - TraDE: Transformers for Density Estimation [101.20137732920718]
TraDE is a self-attention-based architecture for auto-regressive density estimation.
We present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data.
arXiv Detail & Related papers (2020-04-06T07:32:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.