Predictive Modelling of Air Quality Index (AQI) Across Diverse Cities and States of India using Machine Learning: Investigating the Influence of Punjab's Stubble Burning on AQI Variability
- URL: http://arxiv.org/abs/2404.08702v1
- Date: Thu, 11 Apr 2024 05:03:40 GMT
- Title: Predictive Modelling of Air Quality Index (AQI) Across Diverse Cities and States of India using Machine Learning: Investigating the Influence of Punjab's Stubble Burning on AQI Variability
- Authors: Kamaljeet Kaur Sidhu, Habeeb Balogun, Kazeem Oluwakemi Oseni,
- Abstract summary: This research has predicted the AQI based on different air pollutant concentrations in the atmosphere.
The dataset has the air pollutant concentration from 22 different monitoring stations in different cities of Delhi, Haryana, and Punjab.
Different ML models like CatBoost, XGBoost, Random Forest, SVM regressor, time series model SARIMAX, and deep learning model LSTM have been used to predict AQI.
- Score: 0.5266869303483376
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Air pollution is a common and serious problem nowadays and it cannot be ignored as it has harmful impacts on human health. To address this issue proactively, people should be aware of their surroundings, which means the environment where they survive. With this motive, this research has predicted the AQI based on different air pollutant concentrations in the atmosphere. The dataset used for this research has been taken from the official website of CPCB. The dataset has the air pollutant concentration from 22 different monitoring stations in different cities of Delhi, Haryana, and Punjab. This data is checked for null values and outliers. But, the most important thing to note is the correct understanding and imputation of such values rather than ignoring or doing wrong imputation. The time series data has been used in this research which is tested for stationarity using The Dickey-Fuller test. Further different ML models like CatBoost, XGBoost, Random Forest, SVM regressor, time series model SARIMAX, and deep learning model LSTM have been used to predict AQI. For the performance evaluation of different models, I used MSE, RMSE, MAE, and R2. It is observed that Random Forest performed better as compared to other models.
Related papers
- LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Rethinking Benchmark and Contamination for Language Models with
Rephrased Samples [49.18977581962162]
Large language models are increasingly trained on all the data ever produced by humans.
Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets.
arXiv Detail & Related papers (2023-11-08T17:35:20Z) - Imbalanced Aircraft Data Anomaly Detection [103.01418862972564]
Anomaly detection in temporal data from sensors under aviation scenarios is a practical but challenging task.
We propose a Graphical Temporal Data Analysis framework.
It consists three modules, named Series-to-Image (S2I), Cluster-based Resampling Approach using Euclidean Distance (CRD) and Variance-Based Loss (VBL)
arXiv Detail & Related papers (2023-05-17T09:37:07Z) - Managing Large Dataset Gaps in Urban Air Quality Prediction:
DCU-Insight-AQ at MediaEval 2022 [2.0796717061432006]
We focus on gap filling in air quality data where the task is to predict the AQI at 1, 5 and 7 days into the future.
The scenario is where one or a number of air, weather traffic sensors are offline and explores prediction accuracy.
arXiv Detail & Related papers (2022-12-19T16:53:16Z) - GreenEyes: An Air Quality Evaluating Model based on WaveNet [11.513011576336744]
We propose a deep neural network model, which consists of a WaveNet-based backbone block for learning representations of sequences and an LSTM with a Temporal Attention module.
We show our model can effectively predict the air quality level of the next timestamp given any segment of the air quality data from the data set.
arXiv Detail & Related papers (2022-12-08T10:28:57Z) - Data-driven Real-time Short-term Prediction of Air Quality: Comparison
of ES, ARIMA, and LSTM [0.0]
We use a data-driven approach to predict air quality based on historical data.
Considering prediction accuracy and time complexity, our experiments reveal that for short-term air pollution prediction ES performs better than ARIMA and LSTM.
arXiv Detail & Related papers (2022-11-16T09:37:08Z) - Predicting air quality via multimodal AI and satellite imagery [0.2492060267829796]
This paper seeks to create a multi-modal machine learning model for predicting air-quality metrics where monitoring stations do not exist.
A new dataset of European pollution monitoring station measurements is created with features including $textitaltitude, population, etc.$ from the ESA Copernicus project.
These predictions are then aggregated to create an "air-quality index" that could be used to compare air quality over different regions.
arXiv Detail & Related papers (2022-11-01T22:56:15Z) - Evaluation of Time Series Forecasting Models for Estimation of PM2.5
Levels in Air [0.0]
The study adopts ARIMA, FBProphet, and deep learning models such as LSTM, 1D CNN, to estimate the concentration of PM2.5 in the environment.
Our predicted results convey that all adopted methods give comparative outcomes in terms of average root mean squared error.
arXiv Detail & Related papers (2021-04-07T16:24:39Z) - Federated Learning in the Sky: Aerial-Ground Air Quality Sensing
Framework with UAV Swarms [53.38353133198842]
Air quality significantly affects human health, it is increasingly important to accurately and timely predict the Air Quality Index (AQI)
This paper proposes a new federated learning-based aerial-ground air quality sensing framework for fine-grained 3D air quality monitoring and forecasting.
For ground sensing systems, we propose a Graph Convolutional neural network-based Long Short-Term Memory (GC-LSTM) model to achieve accurate, real-time and future AQI inference.
arXiv Detail & Related papers (2020-07-23T13:32:47Z) - Density of States Estimation for Out-of-Distribution Detection [69.90130863160384]
DoSE is the density of states estimator.
We demonstrate DoSE's state-of-the-art performance against other unsupervised OOD detectors.
arXiv Detail & Related papers (2020-06-16T16:06:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.