Towards a data-scale independent regulariser for robust sparse identification of non-linear dynamics
- URL: http://arxiv.org/abs/2603.05201v1
- Date: Thu, 05 Mar 2026 14:11:22 GMT
- Title: Towards a data-scale independent regulariser for robust sparse identification of non-linear dynamics
- Authors: Jay Raut, Daniel N. Wilke, Stephan Schmidt,
- Abstract summary: Data normalisation can severely distort the discovery of governing equations by magnitudebased sparse regression methods.<n>We introduce the Sequential Thresholding of Coefficient of Variation (STCV), a novel, computationally efficient sparse regression algorithm.<n>We show that STCV-based methods can successfully identify the correct, sparse physical laws even when other methods fail.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data normalisation, a common and often necessary preprocessing step in engineering and scientific applications, can severely distort the discovery of governing equations by magnitudebased sparse regression methods. This issue is particularly acute for the Sparse Identification of Nonlinear Dynamics (SINDy) framework, where the core assumption of sparsity is undermined by the interaction between data scaling and measurement noise. The resulting discovered models can be dense, uninterpretable, and physically incorrect. To address this critical vulnerability, we introduce the Sequential Thresholding of Coefficient of Variation (STCV), a novel, computationally efficient sparse regression algorithm that is inherently robust to data scaling. STCV replaces conventional magnitude-based thresholding with a dimensionless statistical metric, the Coefficient Presence (CP), which assesses the statistical validity and consistency of candidate terms in the model library. This shift from magnitude to statistical significance makes the discovery process invariant to arbitrary data scaling. Through comprehensive benchmarking on canonical dynamical systems and practical engineering problems, including a physical mass-spring-damper experiment, we demonstrate that STCV consistently and significantly outperforms standard Sequential Thresholding Least Squares (STLSQ) and Ensemble-SINDy (E-SINDy) on normalised, noisy datasets. The results show that STCV-based methods can successfully identify the correct, sparse physical laws even when other methods fail. By mitigating the distorting effects of normalisation, STCV makes sparse system identification a more reliable and automated tool for real-world applications, thereby enhancing model interpretability and trustworthiness.
Related papers
- Learning Complex Physical Regimes via Coverage-oriented Uncertainty Quantification: An application to the Critical Heat Flux [0.0]
Uncertainty quantification (UQ) should not be viewed as a safety assessment, but as a support to the learning task itself.<n>We focus on the Critical Heat Flux benchmark and dataset presented by the OECD/NEA Expert Group on Reactor Systems Multi-Physics.<n>We show that while post-hoc methods ensure statistical calibration, coverage-oriented learning effectively reshapes the model's representation to match the complex physical regimes.
arXiv Detail & Related papers (2026-02-25T09:04:15Z) - Equivariant Evidential Deep Learning for Interatomic Potentials [55.6997213490859]
Uncertainty quantification is critical for assessing the reliability of machine learning interatomic potentials in molecular dynamics simulations.<n>Existing UQ approaches for MLIPs are often limited by high computational cost or suboptimal performance.<n>We propose textitEquivariant Evidential Deep Learning for Interatomic Potentials ($texte2$IP), a backbone-agnostic framework that models atomic forces and their uncertainty jointly.
arXiv Detail & Related papers (2026-02-11T02:00:25Z) - Discovering Governing Equations in the Presence of Uncertainty [11.752763800308276]
In this work, we theorize that accounting for system variability together with measurement noise is the key to consistently discover the governing equations underlying dynamical systems.<n>We show that SIP consistently identifies the correct equations by an average of 82% relative to the Sparse Identification Dynamics (SINDy) approach and its variant.
arXiv Detail & Related papers (2025-07-13T18:31:25Z) - Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification [3.126303871979975]
SHIFT (Spectral Heteroscedastic Informed Forecasting for Trajectories) is a novel framework that combines well-calibrated uncertainty modeling with informative priors.<n>Our model excels in complex scenarios, such as intersections, where uncertainty is inherently higher.
arXiv Detail & Related papers (2025-04-17T17:24:50Z) - Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems [49.819436680336786]
We propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems.<n>Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics.<n>Our ETGPSSM outperforms existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
arXiv Detail & Related papers (2025-03-24T03:19:45Z) - LAPD: Langevin-Assisted Bayesian Active Learning for Physical Discovery [5.373182035720355]
Langevin-Assisted Active Physical Discovery (LAPD) is a framework that integrates replica-exchange gradient Langevin Monte Carlo.<n>LAPD achieves reliable, uncertainty-aware identification with noisy data.<n>We evaluate LAPD on diverse nonlinear systems such as the Lotka-Volterra, Lorenz, Burgers, and Convection-Diffusion equations.
arXiv Detail & Related papers (2025-03-04T20:17:24Z) - Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion.
We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity.
Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z) - Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps.
We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights.
In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z) - A Priori Denoising Strategies for Sparse Identification of Nonlinear
Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements.
We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z) - TELESTO: A Graph Neural Network Model for Anomaly Classification in
Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance.
One direction aims at the recognition of re-occurring anomaly types to enable remediation automation.
We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z) - Statistical control for spatio-temporal MEG/EEG source imaging with
desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques.
The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge.
We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.