A Feature Importance Analysis for Soft-Sensing-Based Predictions in a
Chemical Sulphonation Process
- URL: http://arxiv.org/abs/2009.12133v1
- Date: Fri, 25 Sep 2020 11:20:06 GMT
- Title: A Feature Importance Analysis for Soft-Sensing-Based Predictions in a
Chemical Sulphonation Process
- Authors: Enrique Garcia-Ceja, {\AA}smund Hugo, Brice Morin, Per-Olav Hansen,
Espen Martinsen, An Ngoc Lam, {\O}ystein Haugen
- Abstract summary: We use a soft-sensing approach, that is, predicting a variable of interest based on other process variables, instead of directly sensing the variable of interest.
The aim of this study was to explore and detect which variables are the most relevant for predicting product quality, and to what degree of precision.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present the results of a feature importance analysis of a
chemical sulphonation process. The task consists of predicting the
neutralization number (NT), which is a metric that characterizes the product
quality of active detergents. The prediction is based on a dataset of
environmental measurements, sampled from an industrial chemical process. We
used a soft-sensing approach, that is, predicting a variable of interest based
on other process variables, instead of directly sensing the variable of
interest. Reasons for doing so range from expensive sensory hardware to harsh
environments, e.g., inside a chemical reactor. The aim of this study was to
explore and detect which variables are the most relevant for predicting product
quality, and to what degree of precision. We trained regression models based on
linear regression, regression tree and random forest. A random forest model was
used to rank the predictor variables by importance. Then, we trained the models
in a forward-selection style by adding one feature at a time, starting with the
most important one. Our results show that it is sufficient to use the top 3
important variables, out of the 8 variables, to achieve satisfactory prediction
results. On the other hand, Random Forest obtained the best result when trained
with all variables.
Related papers
- Enhancing Variable Importance in Random Forests: A Novel Application of Global Sensitivity Analysis [0.9954382983583578]
The present work provides an application of Global Sensitivity Analysis to supervised machine learning methods such as Random Forests.
Global Sensitivity Analysis is primarily used in mathematical modelling to investigate the effect of the uncertainties of the input variables on the output.
A simulation study shows that our proposal can be used to explore what advances can be achieved either in terms of efficiency, explanatory ability, or simply by way of confirming existing results.
arXiv Detail & Related papers (2024-07-19T10:45:36Z) - Applying ranking techniques for estimating influence of Earth variables
on temperature forecast error [0.6144680854063939]
This paper describes how to analyze the influence of Earth system variables on the errors when providing temperature forecasts.
Main contribution is the framework that shows how to convert correlations into rankings and combine them into an aggregate ranking.
We have carried out experiments on five chosen locations to analyze the behavior of this ranking-based methodology.
arXiv Detail & Related papers (2024-03-12T12:59:00Z) - A Notion of Feature Importance by Decorrelation and Detection of Trends
by Random Forest Regression [1.675857332621569]
We introduce a novel notion of feature importance based on the well-studied Gram-Schmidt decorrelation method.
We propose two estimators for identifying trends in the data using random forest regression.
arXiv Detail & Related papers (2023-03-02T11:01:49Z) - MetaRF: Differentiable Random Forest for Reaction Yield Prediction with
a Few Trails [58.47364143304643]
In this paper, we focus on the reaction yield prediction problem.
We first put forth MetaRF, an attention-based differentiable random forest model specially designed for the few-shot yield prediction.
To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method.
arXiv Detail & Related papers (2022-08-22T06:40:13Z) - Variance Minimization in the Wasserstein Space for Invariant Causal
Prediction [72.13445677280792]
In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors.
Each of these tests relies on the minimization of a novel loss function that is derived from tools in optimal transport theory.
We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
arXiv Detail & Related papers (2021-10-13T22:30:47Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Variable selection with missing data in both covariates and outcomes:
Imputation and machine learning [1.0333430439241666]
The missing data issue is ubiquitous in health studies.
Machine learning methods weaken parametric assumptions.
XGBoost and BART have the overall best performance across various settings.
arXiv Detail & Related papers (2021-04-06T20:18:29Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - Achieving Reliable Causal Inference with Data-Mined Variables: A Random
Forest Approach to the Measurement Error Problem [1.5749416770494704]
A common empirical strategy involves the application of predictive modeling techniques to'mine' variables of interest from available data.
Recent work highlights that, because the predictions from machine learning models are inevitably imperfect, econometric analyses based on the predicted variables are likely to suffer from bias due to measurement error.
We propose a novel approach to mitigate these biases, leveraging the ensemble learning technique known as the random forest.
arXiv Detail & Related papers (2020-12-19T21:48:23Z) - Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors.
We argue that sample weighting should be data-dependent and task-dependent.
We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.