Machine learning in wastewater treatment: insights from modelling a pilot denitrification reactor
- URL: http://arxiv.org/abs/2412.14030v1
- Date: Wed, 18 Dec 2024 16:49:23 GMT
- Title: Machine learning in wastewater treatment: insights from modelling a pilot denitrification reactor
- Authors: Eivind Bøhn, Sølve Eidnes, Kjell Rune Jonassen,
- Abstract summary: We use data from a pilot reactor at the Veas treatment facility in Norway to explore how machine learning can be used to optimize biological nitrate.
Rather than focusing solely on predictive accuracy, our approach prioritizes understanding the foundational requirements for effective data-driven modelling.
- Score: 0.0
- License:
- Abstract: Wastewater treatment plants are increasingly recognized as promising candidates for machine learning applications, due to their societal importance and high availability of data. However, their varied designs, operational conditions, and influent characteristics hinder straightforward automation. In this study, we use data from a pilot reactor at the Veas treatment facility in Norway to explore how machine learning can be used to optimize biological nitrate ($\mathrm{NO_3^-}$) reduction to molecular nitrogen ($\mathrm{N_2}$) in the biogeochemical process known as \textit{denitrification}. Rather than focusing solely on predictive accuracy, our approach prioritizes understanding the foundational requirements for effective data-driven modelling of wastewater treatment. Specifically, we aim to identify which process parameters are most critical, the necessary data quantity and quality, how to structure data effectively, and what properties are required by the models. We find that nonlinear models perform best on the training and validation data sets, indicating nonlinear relationships to be learned, but linear models transfer better to the unseen test data, which comes later in time. The variable measuring the water temperature has a particularly detrimental effect on the models, owing to a significant change in distributions between training and test data. We therefore conclude that multiple years of data is necessary to learn robust machine learning models. By addressing foundational elements, particularly in the context of the climatic variability faced by northern regions, this work lays the groundwork for a more structured and tailored approach to machine learning for wastewater treatment. We share publicly both the data and code used to produce the results in the paper.
Related papers
- Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge [0.0]
We show that machine learning surrogate models can achieve under 10% mean absolute percentage error.
We apply this workflow to simulations of variably saturated groundwater flow at a prospective managed aquifer recharge site.
ML surrogate models can achieve under 10% mean absolute percentage error and yield order-of-magnitude runtime savings.
arXiv Detail & Related papers (2024-07-30T15:24:27Z) - A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data [9.57464542357693]
This paper demonstrates that model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering.
We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset.
After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces.
arXiv Detail & Related papers (2024-07-02T09:54:39Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - A Hybrid Approach of Transfer Learning and Physics-Informed Modeling:
Improving Dissolved Oxygen Concentration Prediction in an Industrial
Wastewater Treatment Plant [0.0]
The objective is to increase the prediction performance of an industrial wastewater treatment plant by transferring the knowledge of (i) an open-source simulation model that captures the underlying physics of the process, albeit with dissimilarities to the target plant, and (ii) another industrial plant characterized by noisy and limited data but located in the same refinery, and (iii) the model in (ii)
The results have shown that test and validation performance are improved up to 27% and 59%, respectively.
arXiv Detail & Related papers (2024-01-20T11:53:08Z) - On the choice of training data for machine learning of geostrophic
mesoscale turbulence [0.34376560669160383]
'Data' plays a central role in data-driven methods, but is not often the subject of focus in investigations of machine learning algorithms.
We consider the case of eddy-mean interaction in rotating stratified turbulence in the presence of lateral boundaries.
arXiv Detail & Related papers (2023-07-03T03:43:21Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Size doesn't matter: predicting physico- or biochemical properties based
on dozens of molecules [0.0]
The paper shows a significant improvement in the performance of models for target properties with a lack of data.
The effects of the dataset composition on model quality and the applicability domain of the resulting models are also considered.
arXiv Detail & Related papers (2021-07-22T18:57:24Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.