Using Multivariate Linear Regression for Biochemical Oxygen Demand
Prediction in Waste Water
- URL: http://arxiv.org/abs/2209.14297v1
- Date: Thu, 8 Sep 2022 14:41:02 GMT
- Title: Using Multivariate Linear Regression for Biochemical Oxygen Demand
Prediction in Waste Water
- Authors: Isaiah K. Mutai, Kristof Van Laerhoven, Nancy W. Karuri, Robert K.
Tewo
- Abstract summary: The goal of this work is to examine the capability of MLR in prediction of Biochemical Oxygen Demand (BOD) in waste water through four input variables.
The four input variables have higher correlation strength to BOD out of the seven parameters examined for the strength of correlation.
It was found that increasing the percentage of the training set above 80% of the dataset improved the accuracy of the model only but did not have a significant impact on the prediction capacity of the model.
- Score: 1.9843222704723806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There exist opportunities for Multivariate Linear Regression (MLR) in the
prediction of Biochemical Oxygen Demand (BOD) in waste water, using the diverse
water quality parameters as the input variables. The goal of this work is to
examine the capability of MLR in prediction of BOD in waste water through four
input variables: Dissolved Oxygen (DO), Nitrogen, Fecal Coliform and Total
Coliform. The four input variables have higher correlation strength to BOD out
of the seven parameters examined for the strength of correlation. Machine
Learning (ML) was done with both 80% and 90% of the data as the training set
and 20% and 10% as the test set respectively. MLR performance was evaluated
through the coefficient of correlation (r), Root Mean Square Error (RMSE) and
the percentage accuracy in prediction of BOD. The performance indices for the
input variables of Dissolved Oxygen, Nitrogen, Fecal Coliform and Total
Coliform in prediction of BOD are: RMSE=6.77mg/L, r=0.60 and accuracy 70.3% for
training dataset of 80% and RMSE=6.74mg/L, r=0.60 and accuracy of 87.5% for
training set of 90% of the dataset. It was found that increasing the percentage
of the training set above 80% of the dataset improved the accuracy of the model
only but did not have a significant impact on the prediction capacity of the
model. The results showed that MLR model could be successfully employed in the
estimation of BOD in waste water using appropriately selected input parameters.
Related papers
- DataDecide: How to Predict Best Pretraining Data with Small Experiments [67.95896457895404]
We release models, data, and evaluations in DataDecide -- the most extensive open suite of models over differences in data and scale.
We conduct controlled pretraining experiments across 25 corpora with differing sources, deduplication, and filtering up to 100B tokens, model sizes up to 1B parameters, and 3 random seeds.
arXiv Detail & Related papers (2025-04-15T17:02:15Z) - Analyzing Spatio-Temporal Dynamics of Dissolved Oxygen for the River Thames using Superstatistical Methods and Machine Learning [0.0]
We use superstatistical methods and machine learning to predict dissolved oxygen levels in the River Thames.
For long-term forecasting, the Informer model consistently delivers superior performance.
arXiv Detail & Related papers (2025-01-10T16:54:52Z) - Calibrating Language Models with Adaptive Temperature Scaling [58.056023173579625]
We introduce Adaptive Temperature Scaling (ATS), a post-hoc calibration method that predicts a temperature scaling parameter for each token prediction.
ATS improves calibration by over 10-50% across three downstream natural language evaluation benchmarks compared to prior calibration methods.
arXiv Detail & Related papers (2024-09-29T22:54:31Z) - LLMs & XAI for Water Sustainability: Seasonal Water Quality Prediction with LIME Explainable AI and a RAG-based Chatbot for Insights [0.0]
This paper introduces a hybrid deep learning model to predict Nepal's seasonal water quality using a small dataset with multiple water quality parameters.
CatBoost, XGBoost, Extra Trees, and LightGBM, along with a neural network combining CNN and RNN layers, are used to capture temporal and spatial patterns in the data.
The model demonstrated notable accuracy improvements, aiding proactive water quality control.
arXiv Detail & Related papers (2024-09-17T05:26:59Z) - Impact of Comprehensive Data Preprocessing on Predictive Modelling of COVID-19 Mortality [0.0]
This study evaluates the impact of a custom data preprocessing pipeline on ten machine learning models predicting COVID-19 mortality.
Our pipeline differs from a standard preprocessing pipeline through four key steps.
arXiv Detail & Related papers (2024-08-15T13:23:59Z) - Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - Optimizing PM2.5 Forecasting Accuracy with Hybrid Meta-Heuristic and Machine Learning Models [0.0]
This study focuses on forecasting hourly PM2.5 concentrations using Support Vector Regression (SVR)
Meta-heuristic algorithms, Grey Wolf Optimization (GWO) and Particle Swarm Optimization (PSO) are used to enhance prediction accuracy.
Results show significant improvements with PSO-SVR (R2: 0.9401, RMSE: 0.2390, MAE: 0.1368) and GWO-SVR (R2: 0.9408, RMSE: 0.2376, MAE: 0.1373)
arXiv Detail & Related papers (2024-07-01T05:24:19Z) - Estimating oil and gas recovery factors via machine learning:
Database-dependent accuracy and reliability [0.0]
A key reservoir property is hydrocarbon recovery factor (RF) whose accurate estimation would provide decisive insights to drilling and production strategies.
This study aims to estimate the hydrocarbon RF for exploration from various reservoir characteristics, such as porosity, permeability, pressure, and water saturation via the machine learning (ML) approach.
arXiv Detail & Related papers (2022-10-22T16:25:49Z) - Photoelectric Factor Prediction Using Automated Learning and Uncertainty
Quantification [0.0]
The photoelectric factor (PEF) is an important well logging tool to distinguish between different types of reservoir rocks.
The ratio of rock minerals could be determined by combining PEF log with other well logs.
However, PEF log could be missing in some cases such as in old well logs and wells drilled with barium-based mud.
arXiv Detail & Related papers (2022-06-17T18:03:38Z) - Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter
Encoders for Natural Language Understanding Systems [63.713297451300086]
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B.
Their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system.
arXiv Detail & Related papers (2022-06-15T20:44:23Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - High correlated variables creator machine: Prediction of the compressive
strength of concrete [0.0]
We introduce a novel hybrid model for predicting the compressive strength of concrete using ultrasonic pulse velocity (UPV) and rebound number (RN)
High correlated variables creator machine (HVCM) is used to create the new variables that have a better correlation with the output and improve the prediction models.
The results show that HCVCM-ANFIS can predict the compressive strength of concrete better than all other models.
arXiv Detail & Related papers (2020-09-11T15:06:05Z) - Assessing Graph-based Deep Learning Models for Predicting Flash Point [52.931492216239995]
Graph-based deep learning (GBDL) models were implemented in predicting flash point for the first time.
Average R2 and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3% lower and 2.0 K higher than previous comparable studies.
arXiv Detail & Related papers (2020-02-26T06:10:12Z) - Localized Debiased Machine Learning: Efficient Inference on Quantile
Treatment Effects and Beyond [69.83813153444115]
We consider an efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference.
Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances.
We propose localized debiased machine learning (LDML), which avoids this burdensome step.
arXiv Detail & Related papers (2019-12-30T14:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.