Using Multivariate Linear Regression for Biochemical Oxygen Demand
Prediction in Waste Water
- URL: http://arxiv.org/abs/2209.14297v1
- Date: Thu, 8 Sep 2022 14:41:02 GMT
- Title: Using Multivariate Linear Regression for Biochemical Oxygen Demand
Prediction in Waste Water
- Authors: Isaiah K. Mutai, Kristof Van Laerhoven, Nancy W. Karuri, Robert K.
Tewo
- Abstract summary: The goal of this work is to examine the capability of MLR in prediction of Biochemical Oxygen Demand (BOD) in waste water through four input variables.
The four input variables have higher correlation strength to BOD out of the seven parameters examined for the strength of correlation.
It was found that increasing the percentage of the training set above 80% of the dataset improved the accuracy of the model only but did not have a significant impact on the prediction capacity of the model.
- Score: 1.9843222704723806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There exist opportunities for Multivariate Linear Regression (MLR) in the
prediction of Biochemical Oxygen Demand (BOD) in waste water, using the diverse
water quality parameters as the input variables. The goal of this work is to
examine the capability of MLR in prediction of BOD in waste water through four
input variables: Dissolved Oxygen (DO), Nitrogen, Fecal Coliform and Total
Coliform. The four input variables have higher correlation strength to BOD out
of the seven parameters examined for the strength of correlation. Machine
Learning (ML) was done with both 80% and 90% of the data as the training set
and 20% and 10% as the test set respectively. MLR performance was evaluated
through the coefficient of correlation (r), Root Mean Square Error (RMSE) and
the percentage accuracy in prediction of BOD. The performance indices for the
input variables of Dissolved Oxygen, Nitrogen, Fecal Coliform and Total
Coliform in prediction of BOD are: RMSE=6.77mg/L, r=0.60 and accuracy 70.3% for
training dataset of 80% and RMSE=6.74mg/L, r=0.60 and accuracy of 87.5% for
training set of 90% of the dataset. It was found that increasing the percentage
of the training set above 80% of the dataset improved the accuracy of the model
only but did not have a significant impact on the prediction capacity of the
model. The results showed that MLR model could be successfully employed in the
estimation of BOD in waste water using appropriately selected input parameters.
Related papers
- Handcrafted vs. Deep Radiomics vs. Fusion vs. Deep Learning: A Comprehensive Review of Machine Learning -Based Cancer Outcome Prediction in PET and SPECT Imaging [0.7573820776203027]
This systematic review analyzed 226 studies published from 2020 to 2025 that applied machine learning to PET or SPECT imaging for outcome prediction.<n> PET-based studies generally outperformed those using SPECT, likely due to higher spatial resolution and sensitivity.<n>Common limitations included inadequate handling of class imbalance, missing data, and low population diversity.
arXiv Detail & Related papers (2025-07-21T21:03:12Z) - A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation [55.53426007439564]
Estimating individualized treatment effects from observational data is a central challenge in causal inference.<n>In inverse probability weighting (IPW) is a well-established solution to this problem, but its integration into modern deep learning frameworks remains limited.<n>We propose Importance-Weighted Diffusion Distillation (IWDD), a novel generative framework that combines the pretraining of diffusion models with importance-weighted score distillation.
arXiv Detail & Related papers (2025-05-16T17:00:52Z) - DataDecide: How to Predict Best Pretraining Data with Small Experiments [67.95896457895404]
We release models, data, and evaluations in DataDecide -- the most extensive open suite of models over differences in data and scale.
We conduct controlled pretraining experiments across 25 corpora with differing sources, deduplication, and filtering up to 100B tokens, model sizes up to 1B parameters, and 3 random seeds.
arXiv Detail & Related papers (2025-04-15T17:02:15Z) - Analyzing Spatio-Temporal Dynamics of Dissolved Oxygen for the River Thames using Superstatistical Methods and Machine Learning [0.0]
We use superstatistical methods and machine learning to predict dissolved oxygen levels in the River Thames.
For long-term forecasting, the Informer model consistently delivers superior performance.
arXiv Detail & Related papers (2025-01-10T16:54:52Z) - Calibrating Language Models with Adaptive Temperature Scaling [58.056023173579625]
We introduce Adaptive Temperature Scaling (ATS), a post-hoc calibration method that predicts a temperature scaling parameter for each token prediction.
ATS improves calibration by over 10-50% across three downstream natural language evaluation benchmarks compared to prior calibration methods.
arXiv Detail & Related papers (2024-09-29T22:54:31Z) - LLMs & XAI for Water Sustainability: Seasonal Water Quality Prediction with LIME Explainable AI and a RAG-based Chatbot for Insights [0.0]
This paper introduces a hybrid deep learning model to predict Nepal's seasonal water quality using a small dataset with multiple water quality parameters.
CatBoost, XGBoost, Extra Trees, and LightGBM, along with a neural network combining CNN and RNN layers, are used to capture temporal and spatial patterns in the data.
The model demonstrated notable accuracy improvements, aiding proactive water quality control.
arXiv Detail & Related papers (2024-09-17T05:26:59Z) - Impact of Comprehensive Data Preprocessing on Predictive Modelling of COVID-19 Mortality [0.0]
This study evaluates the impact of a custom data preprocessing pipeline on ten machine learning models predicting COVID-19 mortality.
Our pipeline differs from a standard preprocessing pipeline through four key steps.
arXiv Detail & Related papers (2024-08-15T13:23:59Z) - Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - Optimizing PM2.5 Forecasting Accuracy with Hybrid Meta-Heuristic and Machine Learning Models [0.0]
This study focuses on forecasting hourly PM2.5 concentrations using Support Vector Regression (SVR)
Meta-heuristic algorithms, Grey Wolf Optimization (GWO) and Particle Swarm Optimization (PSO) are used to enhance prediction accuracy.
Results show significant improvements with PSO-SVR (R2: 0.9401, RMSE: 0.2390, MAE: 0.1368) and GWO-SVR (R2: 0.9408, RMSE: 0.2376, MAE: 0.1373)
arXiv Detail & Related papers (2024-07-01T05:24:19Z) - DiffPuter: Empowering Diffusion Models for Missing Data Imputation [56.48119008663155]
This paper introduces DiffPuter, a tailored diffusion model combined with the Expectation-Maximization (EM) algorithm for missing data imputation.<n>Our theoretical analysis shows that DiffPuter's training step corresponds to the maximum likelihood estimation of data density.<n>Our experiments show that DiffPuter achieves an average improvement of 6.94% in MAE and 4.78% in RMSE compared to the most competitive existing method.
arXiv Detail & Related papers (2024-05-31T08:35:56Z) - Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression [47.1405366895538]
Sparse linear regression methods for high-dimensional data commonly assume that residuals have constant variance, which can be violated in practice.
This paper proposes estimating high-dimensional heteroscedastic linear regression models using a heteroscedastic partitioned empirical Bayes Expectation Conditional Maximization algorithm.
arXiv Detail & Related papers (2023-09-15T22:06:29Z) - Estimating oil and gas recovery factors via machine learning:
Database-dependent accuracy and reliability [0.0]
A key reservoir property is hydrocarbon recovery factor (RF) whose accurate estimation would provide decisive insights to drilling and production strategies.
This study aims to estimate the hydrocarbon RF for exploration from various reservoir characteristics, such as porosity, permeability, pressure, and water saturation via the machine learning (ML) approach.
arXiv Detail & Related papers (2022-10-22T16:25:49Z) - Photoelectric Factor Prediction Using Automated Learning and Uncertainty
Quantification [0.0]
The photoelectric factor (PEF) is an important well logging tool to distinguish between different types of reservoir rocks.
The ratio of rock minerals could be determined by combining PEF log with other well logs.
However, PEF log could be missing in some cases such as in old well logs and wells drilled with barium-based mud.
arXiv Detail & Related papers (2022-06-17T18:03:38Z) - Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter
Encoders for Natural Language Understanding Systems [63.713297451300086]
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B.
Their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system.
arXiv Detail & Related papers (2022-06-15T20:44:23Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - High correlated variables creator machine: Prediction of the compressive
strength of concrete [0.0]
We introduce a novel hybrid model for predicting the compressive strength of concrete using ultrasonic pulse velocity (UPV) and rebound number (RN)
High correlated variables creator machine (HVCM) is used to create the new variables that have a better correlation with the output and improve the prediction models.
The results show that HCVCM-ANFIS can predict the compressive strength of concrete better than all other models.
arXiv Detail & Related papers (2020-09-11T15:06:05Z) - Assessing Graph-based Deep Learning Models for Predicting Flash Point [52.931492216239995]
Graph-based deep learning (GBDL) models were implemented in predicting flash point for the first time.
Average R2 and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3% lower and 2.0 K higher than previous comparable studies.
arXiv Detail & Related papers (2020-02-26T06:10:12Z) - Localized Debiased Machine Learning: Efficient Inference on Quantile
Treatment Effects and Beyond [69.83813153444115]
We consider an efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference.
Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances.
We propose localized debiased machine learning (LDML), which avoids this burdensome step.
arXiv Detail & Related papers (2019-12-30T14:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.