Parametrising the Inhomogeneity Inducing Capacity of a Training Set, and its Impact on Supervised Learning
- URL: http://arxiv.org/abs/2510.18332v1
- Date: Tue, 21 Oct 2025 06:34:22 GMT
- Title: Parametrising the Inhomogeneity Inducing Capacity of a Training Set, and its Impact on Supervised Learning
- Authors: Gargi Roy, Dalia Chakrabarty,
- Abstract summary: We refer to a parametrisation of this property of a given training set, as its inhomogeneity parameter''<n>We prove that a training set with a non-zero inhomogeneity parameter renders it imperative, that the process that is invoked to model the sought function, be non-stationary.
- Score: 0.042970700836450486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce parametrisation of that property of the available training dataset, that necessitates an inhomogeneous correlation structure for the function that is learnt as a model of the relationship between the pair of variables, observations of which comprise the considered training data. We refer to a parametrisation of this property of a given training set, as its ``inhomogeneity parameter''. It is easy to compute this parameter for small-to-large datasets, and we demonstrate such computation on multiple publicly-available datasets, while also demonstrating that conventional ``non-stationarity'' of data does not imply a non-zero inhomogeneity parameter of the dataset. We prove that - within the probabilistic Gaussian Process-based learning approach - a training set with a non-zero inhomogeneity parameter renders it imperative, that the process that is invoked to model the sought function, be non-stationary. Following the learning of a real-world multivariate function with such a Process, quality and reliability of predictions at test inputs, are demonstrated to be affected by the inhomogeneity parameter of the training data.
Related papers
- Active learning for data-driven reduced models of parametric differential systems with Bayesian operator inference [0.0]
This work develops an active learning framework to intelligently enrich data-driven reduced-order models (ROMs) of parametric dynamical systems.<n>Data-driven ROMs are explainable, computationally efficient scientific machine learning models.
arXiv Detail & Related papers (2025-12-30T19:34:26Z) - Cross-Learning from Scarce Data via Multi-Task Constrained Optimization [70.90607489166648]
This paper introduces a multi-task emphcross-learning framework to overcome data scarcity.<n>We formulate this joint estimation as a constrained optimization problem.<n>We show the efficiency of our cross-learning method in applications with real data including image classification and propagation of infectious diseases.
arXiv Detail & Related papers (2025-11-17T18:35:59Z) - Learning to Weight Parameters for Training Data Attribution [62.830878652285406]
We propose a method to explicitly learn parameter importance weights directly from data, without requiring annotated labels.<n>Our approach improves attribution accuracy across diverse tasks, including image classification, language modeling, and diffusion, and enables fine-grained attribution for concepts like subject and style.
arXiv Detail & Related papers (2025-06-06T00:32:04Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.<n>We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.<n>As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - MARS: Meta-Learning as Score Matching in the Function Space [79.73213540203389]
We present a novel approach to extracting inductive biases from a set of related datasets.
We use functional Bayesian neural network inference, which views the prior as a process and performs inference in the function space.
Our approach can seamlessly acquire and represent complex prior knowledge by metalearning the score function of the data-generating process.
arXiv Detail & Related papers (2022-10-24T15:14:26Z) - A Causality-Based Learning Approach for Discovering the Underlying
Dynamics of Complex Systems from Partial Observations with Stochastic
Parameterization [1.2882319878552302]
This paper develops a new iterative learning algorithm for complex turbulent systems with partial observations.
It alternates between identifying model structures, recovering unobserved variables, and estimating parameters.
Numerical experiments show that the new algorithm succeeds in identifying the model structure and providing suitable parameterizations for many complex nonlinear systems.
arXiv Detail & Related papers (2022-08-19T00:35:03Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - On the Parameter Combinations That Matter and on Those That do Not [0.0]
We present a data-driven approach to characterizing nonidentifiability of a model's parameters.
By employing Diffusion Maps and their extensions, we discover the minimal combinations of parameters required to characterize the dynamic output behavior.
arXiv Detail & Related papers (2021-10-13T13:46:23Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Generative Learning of Heterogeneous Tail Dependence [13.60514494665717]
Our model features heterogeneous and asymmetric tail dependence between all pairs of individual dimensions.
We devise a novel moment learning algorithm to learn the parameters.
Results show that this framework gives better finite-sample performance compared to the copula-based benchmarks.
arXiv Detail & Related papers (2020-11-26T05:34:31Z) - Learning Stable Nonparametric Dynamical Systems with Gaussian Process
Regression [9.126353101382607]
We learn a nonparametric Lyapunov function based on Gaussian process regression from data.
We prove that stabilization of the nominal model based on the nonparametric control Lyapunov function does not modify the behavior of the nominal model at training samples.
arXiv Detail & Related papers (2020-06-14T11:17:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.