Coastal water quality prediction based on machine learning with feature
interpretation and spatio-temporal analysis
- URL: http://arxiv.org/abs/2107.03230v2
- Date: Fri, 9 Jul 2021 07:09:03 GMT
- Title: Coastal water quality prediction based on machine learning with feature
interpretation and spatio-temporal analysis
- Authors: Luka Grb\v{c}i\'c, Sini\v{s}a Dru\v{z}eta, Goran Mau\v{s}a, Tomislav
Lipi\'c, Darija Vuki\'c Lu\v{s}i\'c, Marta Alvir, Ivana Lu\v{c}in, Ante
Sikirica, Davor Davidovi\'c, Vanja Trava\v{s}, Daniela Kalafatovi\'c,
Kristina Pikelj, Hana Fajkovi\'c, Toni Holjevi\'c and Lado Kranj\v{c}evi\'c
- Abstract summary: Poor coastal water quality can harbor pathogens that are dangerous to human health.
Routine monitoring data of $Escherichia Coli$ and enterococci across 15 public beaches in Rijeka, Croatia, were used to build machine learning models.
Catboost algorithm performed best with R$2$ values of 0.71 and 0.68 for predicting $E. Coli$ and enterococci.
- Score: 1.1124907412872893
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Coastal water quality management is a public health concern, as poor coastal
water quality can harbor pathogens that are dangerous to human health.
Tourism-oriented countries need to actively monitor the condition of coastal
water at tourist popular sites during the summer season. In this study, routine
monitoring data of $Escherichia\ Coli$ and enterococci across 15 public beaches
in the city of Rijeka, Croatia, were used to build machine learning models for
predicting their levels based on environmental parameters as well as to
investigate their relationships with environmental stressors. Gradient Boosting
(Catboost, Xgboost), Random Forests, Support Vector Regression and Artificial
Neural Networks were trained with measurements from all sampling sites and used
to predict $E.\ Coli$ and enterococci values based on environmental features.
The evaluation of stability and generalizability with 10-fold cross validation
analysis of the machine learning models, showed that the Catboost algorithm
performed best with R$^2$ values of 0.71 and 0.68 for predicting $E.\ Coli$ and
enterococci, respectively, compared to other evaluated ML algorithms including
Xgboost, Random Forests, Support Vector Regression and Artificial Neural
Networks. We also use the SHapley Additive exPlanations technique to identify
and interpret which features have the most predictive power. The results show
that site salinity measured is the most important feature for forecasting both
$E.\ Coli$ and enterococci levels. Finally, the spatial and temporal accuracy
of both ML models were examined at sites with the lowest coastal water quality.
The spatial $E. Coli$ and enterococci models achieved strong R$^2$ values of
0.85 and 0.83, while the temporal models achieved R$^2$ values of 0.74 and
0.67. The temporal model also achieved moderate R$^2$ values of 0.44 and 0.46
at a site with high coastal water quality.
Related papers
- Analyzing Spatio-Temporal Dynamics of Dissolved Oxygen for the River Thames using Superstatistical Methods and Machine Learning [0.0]
We use superstatistical methods and machine learning to predict dissolved oxygen levels in the River Thames.
For long-term forecasting, the Informer model consistently delivers superior performance.
arXiv Detail & Related papers (2025-01-10T16:54:52Z) - LLMs & XAI for Water Sustainability: Seasonal Water Quality Prediction with LIME Explainable AI and a RAG-based Chatbot for Insights [0.0]
This paper introduces a hybrid deep learning model to predict Nepal's seasonal water quality using a small dataset with multiple water quality parameters.
CatBoost, XGBoost, Extra Trees, and LightGBM, along with a neural network combining CNN and RNN layers, are used to capture temporal and spatial patterns in the data.
The model demonstrated notable accuracy improvements, aiding proactive water quality control.
arXiv Detail & Related papers (2024-09-17T05:26:59Z) - Phikon-v2, A large and public feature extractor for biomarker prediction [42.52549987351643]
We train a vision transformer using DINOv2 and publicly release one iteration of this model for further experimentation, coined Phikon-v2.
While trained on publicly available histology slides, Phikon-v2 surpasses our previously released model (Phikon) and performs on par with other histopathology foundation models (FM) trained on proprietary data.
arXiv Detail & Related papers (2024-09-13T20:12:29Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling [58.456404022536425]
State of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs.
Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative.
The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
arXiv Detail & Related papers (2023-09-24T19:57:22Z) - Short-term prediction of stream turbidity using surrogate data and a
meta-model approach [0.0]
We build and compare the ability of dynamic regression (ARIMA), long short-term memory neural nets (LSTM), and generalized additive models (GAM) to forecast stream turbidity.
We construct a meta-model, trained on time-series features of turbidity, to take advantage of the strengths of each model over different time points.
Our findings indicate that temperature and light-associated variables, for example underwater illuminance, may hold promise as cost-effective surrogates of turbidity.
arXiv Detail & Related papers (2022-10-11T23:05:32Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - Artificial Intelligence Hybrid Deep Learning Model for Groundwater Level
Prediction Using MLP-ADAM [0.0]
In this paper, a multi-layer perceptron is applied to simulate groundwater level.
The adaptive moment estimation algorithm is also used to this matter.
Results indicate that deep learning algorithms can demonstrate a high accuracy prediction.
arXiv Detail & Related papers (2021-07-29T10:11:45Z) - Instance Segmentation of Microscopic Foraminifera [0.0629976670819788]
We present a deep learning-based instance segmentation model for classifying, detecting, and segmenting microscopic foraminifera.
Our model is based on the Mask R-CNN architecture, using model weight parameters that have learned on the COCO detection dataset.
arXiv Detail & Related papers (2021-05-15T10:46:22Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Automatic sleep stage classification with deep residual networks in a
mixed-cohort setting [63.52264764099532]
We developed a novel deep neural network model to assess the generalizability of several large-scale cohorts.
Overall classification accuracy improved with increasing fractions of training data.
arXiv Detail & Related papers (2020-08-21T10:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.