Harmful algal bloom forecasting. A comparison between stream and batch
learning
- URL: http://arxiv.org/abs/2402.13304v1
- Date: Tue, 20 Feb 2024 15:01:11 GMT
- Title: Harmful algal bloom forecasting. A comparison between stream and batch
learning
- Authors: Andres Molares-Ulloa, Elisabet Rocruz, Daniel Rivero, Xos\'e A. Padin,
Rita Nolasco, Jes\'us Dubert and Enrique Fernandez-Blanco
- Abstract summary: Harmful Algal Blooms (HABs) pose risks to public health and the shellfish industry.
This study develops a machine learning workflow for predicting the number of cells of a toxic dinoflagellate.
The model DoME emerged as the most effective and interpretable predictor, outperforming the other algorithms.
- Score: 0.7067443325368975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diarrhetic Shellfish Poisoning (DSP) is a global health threat arising from
shellfish contaminated with toxins produced by dinoflagellates. The condition,
with its widespread incidence, high morbidity rate, and persistent shellfish
toxicity, poses risks to public health and the shellfish industry. High biomass
of toxin-producing algae such as DSP are known as Harmful Algal Blooms (HABs).
Monitoring and forecasting systems are crucial for mitigating HABs impact.
Predicting harmful algal blooms involves a time-series-based problem with a
strong historical seasonal component, however, recent anomalies due to changes
in meteorological and oceanographic events have been observed. Stream Learning
stands out as one of the most promising approaches for addressing
time-series-based problems with concept drifts. However, its efficacy in
predicting HABs remains unproven and needs to be tested in comparison with
Batch Learning. Historical data availability is a critical point in developing
predictive systems. In oceanography, the available data collection can have
some constrains and limitations, which has led to exploring new tools to obtain
more exhaustive time series. In this study, a machine learning workflow for
predicting the number of cells of a toxic dinoflagellate, Dinophysis acuminata,
was developed with several key advancements. Seven machine learning algorithms
were compared within two learning paradigms. Notably, the output data from
CROCO, the ocean hydrodynamic model, was employed as the primary dataset,
palliating the limitation of time-continuous historical data. This study
highlights the value of models interpretability, fair models comparison
methodology, and the incorporation of Stream Learning models. The model DoME,
with an average R2 of 0.77 in the 3-day-ahead prediction, emerged as the most
effective and interpretable predictor, outperforming the other algorithms.
Related papers
- Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices [0.0]
Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability.
A 5-minute prediction window was chosen for timely intervention, with minute-levels standardizing the data.
This study highlights ML's potential to improve triage and reduce alarm fatigue.
arXiv Detail & Related papers (2024-10-30T23:24:28Z) - Explainable machine learning for predicting shellfish toxicity in the Adriatic Sea using long-term monitoring data of HABs [0.0]
We train and evaluate machine learning models to accurately predict diarrhetic shellfish poisoning events.
The random forest model provided the best prediction of positive toxicity results based on the F1 score.
Key species (Dinophysis fortii and D. caudata) and environmental factors (salinity, river discharge and precipitation) were the best predictors of DSP outbreaks.
arXiv Detail & Related papers (2024-05-07T14:55:42Z) - Hybrid Machine Learning techniques in the management of harmful algal
blooms impact [0.7864304771129751]
Mollusc farming can be affected by Harmful algal blooms (HABs)
HABs are episodes of high concentrations of algae that are potentially toxic for human consumption.
To avoid the risk to human consumption, harvesting is prohibited when toxicity is detected.
arXiv Detail & Related papers (2024-02-14T15:59:22Z) - An Extreme-Adaptive Time Series Prediction Model Based on
Probability-Enhanced LSTM Neural Networks [6.5700527395783315]
We propose a novel probability-enhanced neural network model, called NEC+, which concurrently learns extreme and normal prediction functions.
We evaluate the proposed model on the difficult 3-day ahead hourly water level prediction task applied to 9 reservoirs in California.
arXiv Detail & Related papers (2022-11-29T03:01:59Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z) - When in Doubt: Neural Non-Parametric Uncertainty Quantification for
Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions.
Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations.
We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Gaussian Process Nowcasting: Application to COVID-19 Mortality Reporting [2.8712862578745018]
Updating observations of a signal due to the delays in the measurement process is a common problem in signal processing.
We present a flexible approach using a latent Gaussian process that is capable of describing the changing auto-correlation structure present in the reporting time-delay surface.
This approach also yields robust estimates of uncertainty for the estimated nowcasted numbers of deaths.
arXiv Detail & Related papers (2021-02-22T18:32:44Z) - STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological
Regularization [76.57716281104938]
We develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously.
STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations.
We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic.
arXiv Detail & Related papers (2020-12-08T21:21:47Z) - DeepRite: Deep Recurrent Inverse TreatmEnt Weighting for Adjusting
Time-varying Confounding in Modern Longitudinal Observational Data [68.29870617697532]
We propose Deep Recurrent Inverse TreatmEnt weighting (DeepRite) for time-varying confounding in longitudinal data.
DeepRite is shown to recover the ground truth from synthetic data, and estimate unbiased treatment effects from real data.
arXiv Detail & Related papers (2020-10-28T15:05:08Z) - A General Framework for Survival Analysis and Multi-State Modelling [70.31153478610229]
We use neural ordinary differential equations as a flexible and general method for estimating multi-state survival models.
We show that our model exhibits state-of-the-art performance on popular survival data sets and demonstrate its efficacy in a multi-state setting.
arXiv Detail & Related papers (2020-06-08T19:24:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.