A Data-Driven Method for Automated Data Superposition with Applications
in Soft Matter Science
- URL: http://arxiv.org/abs/2204.09521v1
- Date: Wed, 20 Apr 2022 14:58:04 GMT
- Title: A Data-Driven Method for Automated Data Superposition with Applications
in Soft Matter Science
- Authors: Kyle R. Lennon, Gareth H. McKinley, James W. Swan
- Abstract summary: We develop a data-driven, non-parametric method for superposing experimental data with arbitrary coordinate transformations.
Our method produces interpretable data-driven models that may inform applications such as materials classification, design, and discovery.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The superposition of data sets with internal parametric self-similarity is a
longstanding and widespread technique for the analysis of many types of
experimental data across the physical sciences. Typically, this superposition
is performed manually, or recently by one of a few automated algorithms.
However, these methods are often heuristic in nature, are prone to user bias
via manual data shifting or parameterization, and lack a native framework for
handling uncertainty in both the data and the resulting model of the superposed
data. In this work, we develop a data-driven, non-parametric method for
superposing experimental data with arbitrary coordinate transformations, which
employs Gaussian process regression to learn statistical models that describe
the data, and then uses maximum a posteriori estimation to optimally superpose
the data sets. This statistical framework is robust to experimental noise, and
automatically produces uncertainty estimates for the learned coordinate
transformations. Moreover, it is distinguished from black-box machine learning
in its interpretability -- specifically, it produces a model that may itself be
interrogated to gain insight into the system under study. We demonstrate these
salient features of our method through its application to four representative
data sets characterizing the mechanics of soft materials. In every case, our
method replicates results obtained using other approaches, but with reduced
bias and the addition of uncertainty estimates. This method enables a
standardized, statistical treatment of self-similar data across many fields,
producing interpretable data-driven models that may inform applications such as
materials classification, design, and discovery.
Related papers
- Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks.
We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.
We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.
As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - Topological Approach for Data Assimilation [0.4972323953932129]
We introduce a new data assimilation algorithm with a foundation in topological data analysis.
By leveraging the differentiability of functions of persistence, gradient descent optimization is used to minimize topological differences between measurements and forecast predictions.
arXiv Detail & Related papers (2024-11-12T20:24:46Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Online Performance Estimation with Unlabeled Data: A Bayesian Application of the Hui-Walter Paradigm [0.0]
We adapt the Hui-Walter paradigm, a method traditionally applied in epidemiology and medicine, to the field of machine learning.
We estimate key performance metrics such as false positive rate, false negative rate, and priors in scenarios where no ground truth is available.
We extend this paradigm for handling online data, opening up new possibilities for dynamic data environments.
arXiv Detail & Related papers (2024-01-17T17:46:10Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Two ways towards combining Sequential Neural Network and Statistical
Methods to Improve the Prediction of Time Series [0.34265828682659694]
We propose two different directions to integrate the two, a decomposition-based method and a method exploiting the statistic extraction of data features.
We evaluate the proposal using time series data with varying degrees of stability.
Performance results show that both methods can outperform existing schemes that use models and learning separately.
arXiv Detail & Related papers (2021-09-30T20:34:58Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - Efficient Multidimensional Functional Data Analysis Using Marginal
Product Basis Systems [2.4554686192257424]
We propose a framework for learning continuous representations from a sample of multidimensional functional data.
We show that the resulting estimation problem can be solved efficiently by the tensor decomposition.
We conclude with a real data application in neuroimaging.
arXiv Detail & Related papers (2021-07-30T16:02:15Z) - Using Data Assimilation to Train a Hybrid Forecast System that Combines
Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements.
We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.