A Data-driven feature selection and machine-learning model benchmark for
the prediction of longitudinal dispersion coefficient
- URL: http://arxiv.org/abs/2107.12970v1
- Date: Fri, 16 Jul 2021 09:50:38 GMT
- Title: A Data-driven feature selection and machine-learning model benchmark for
the prediction of longitudinal dispersion coefficient
- Authors: Yifeng Zhao, Pei Zhang, S.A. Galindo-Torres, Stan Z. Li
- Abstract summary: An accurate prediction on Longitudinal Dispersion(LD) coefficient can produce a performance leap in related simulation.
In this study, a global optimal feature set was proposed through numerical comparison of the distilled local optimums in performance with representative ML models.
Results show that the support vector machine has significantly better performance than other models.
- Score: 29.58577229101903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Longitudinal Dispersion(LD) is the dominant process of scalar transport in
natural streams. An accurate prediction on LD coefficient(Dl) can produce a
performance leap in related simulation. The emerging machine learning(ML)
techniques provide a self-adaptive tool for this problem. However, most of the
existing studies utilize an unproved quaternion feature set, obtained through
simple theoretical deduction. Few studies have put attention on its reliability
and rationality. Besides, due to the lack of comparative comparison, the proper
choice of ML models in different scenarios still remains unknown. In this
study, the Feature Gradient selector was first adopted to distill the local
optimal feature sets directly from multivariable data. Then, a global optimal
feature set (the channel width, the flow velocity, the channel slope and the
cross sectional area) was proposed through numerical comparison of the
distilled local optimums in performance with representative ML models. The
channel slope is identified to be the key parameter for the prediction of LDC.
Further, we designed a weighted evaluation metric which enables comprehensive
model comparison. With the simple linear model as the baseline, a benchmark of
single and ensemble learning models was provided. Advantages and disadvantages
of the methods involved were also discussed. Results show that the support
vector machine has significantly better performance than other models. Decision
tree is not suitable for this problem due to poor generalization ability.
Notably, simple models show superiority over complicated model on this
low-dimensional problem, for their better balance between regression and
generalization.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Variational Bayesian surrogate modelling with application to robust design optimisation [0.9626666671366836]
Surrogate models provide a quick-to-evaluate approximation to complex computational models.
We consider Bayesian inference for constructing statistical surrogates with input uncertainties and dimensionality reduction.
We demonstrate intrinsic and robust structural optimisation problems where cost functions depend on a weighted sum of the mean and standard deviation of model outputs.
arXiv Detail & Related papers (2024-04-23T09:22:35Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Learning Residual Model of Model Predictive Control via Random Forests
for Autonomous Driving [13.865293598486492]
One major issue in predictive control (MPC) for autonomous driving is the contradiction between the system model's prediction and computation.
This paper reformulates the MPC tracking accuracy as a program (QP) problem optimization as a program (QP) can effectively solve it.
arXiv Detail & Related papers (2023-04-10T03:32:09Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Variational Inference with NoFAS: Normalizing Flow with Adaptive
Surrogate for Computationally Expensive Models [7.217783736464403]
Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive.
New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space.
We propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and the weights of a neural network surrogate model.
arXiv Detail & Related papers (2021-08-28T14:31:45Z) - An interpretable prediction model for longitudinal dispersion
coefficient in natural streams based on evolutionary symbolic regression
network [30.99493442296212]
Various methods have been proposed for predictions of longitudinal dispersion coefficient(LDC)
In this paper, we first present an in-depth analysis of those methods and find out their defects.
We then design a novel symbolic regression method called evolutionary symbolic regression network(ESRN)
arXiv Detail & Related papers (2021-06-17T07:06:05Z) - Combining data assimilation and machine learning to infer unresolved
scale parametrisation [0.0]
In recent years, machine learning has been proposed to devise data-driven parametrisations of unresolved processes in dynamical numerical models.
Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrisation using direct data.
We show that in both cases the hybrid model yields forecasts with better skill than the truncated model.
arXiv Detail & Related papers (2020-09-09T14:12:11Z) - Non-parametric Models for Non-negative Functions [48.7576911714538]
We provide the first model for non-negative functions from the same good linear models.
We prove that it admits a representer theorem and provide an efficient dual formulation for convex problems.
arXiv Detail & Related papers (2020-07-08T07:17:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.