A Data-Driven Method for Automated Data Superposition with Applications
in Soft Matter Science
- URL: http://arxiv.org/abs/2204.09521v1
- Date: Wed, 20 Apr 2022 14:58:04 GMT
- Title: A Data-Driven Method for Automated Data Superposition with Applications
in Soft Matter Science
- Authors: Kyle R. Lennon, Gareth H. McKinley, James W. Swan
- Abstract summary: We develop a data-driven, non-parametric method for superposing experimental data with arbitrary coordinate transformations.
Our method produces interpretable data-driven models that may inform applications such as materials classification, design, and discovery.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The superposition of data sets with internal parametric self-similarity is a
longstanding and widespread technique for the analysis of many types of
experimental data across the physical sciences. Typically, this superposition
is performed manually, or recently by one of a few automated algorithms.
However, these methods are often heuristic in nature, are prone to user bias
via manual data shifting or parameterization, and lack a native framework for
handling uncertainty in both the data and the resulting model of the superposed
data. In this work, we develop a data-driven, non-parametric method for
superposing experimental data with arbitrary coordinate transformations, which
employs Gaussian process regression to learn statistical models that describe
the data, and then uses maximum a posteriori estimation to optimally superpose
the data sets. This statistical framework is robust to experimental noise, and
automatically produces uncertainty estimates for the learned coordinate
transformations. Moreover, it is distinguished from black-box machine learning
in its interpretability -- specifically, it produces a model that may itself be
interrogated to gain insight into the system under study. We demonstrate these
salient features of our method through its application to four representative
data sets characterizing the mechanics of soft materials. In every case, our
method replicates results obtained using other approaches, but with reduced
bias and the addition of uncertainty estimates. This method enables a
standardized, statistical treatment of self-similar data across many fields,
producing interpretable data-driven models that may inform applications such as
materials classification, design, and discovery.
Related papers
- Inference for Large Scale Regression Models with Dependent Errors [3.3160726548489015]
This work defines and proves the statistical properties of the Generalized Method of Wavelet Moments with Exogenous variables (GMWMX)
It is a highly scalable, stable, and statistically valid method for estimating and delivering inference for linear models using processes in the presence of data complexities like latent dependence structures and missing data.
arXiv Detail & Related papers (2024-09-08T17:01:05Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Online Performance Estimation with Unlabeled Data: A Bayesian Application of the Hui-Walter Paradigm [0.0]
We adapt the Hui-Walter paradigm, a method traditionally applied in epidemiology and medicine, to the field of machine learning.
We estimate key performance metrics such as false positive rate, false negative rate, and priors in scenarios where no ground truth is available.
We extend this paradigm for handling online data, opening up new possibilities for dynamic data environments.
arXiv Detail & Related papers (2024-01-17T17:46:10Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Two ways towards combining Sequential Neural Network and Statistical
Methods to Improve the Prediction of Time Series [0.34265828682659694]
We propose two different directions to integrate the two, a decomposition-based method and a method exploiting the statistic extraction of data features.
We evaluate the proposal using time series data with varying degrees of stability.
Performance results show that both methods can outperform existing schemes that use models and learning separately.
arXiv Detail & Related papers (2021-09-30T20:34:58Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - Efficient Multidimensional Functional Data Analysis Using Marginal
Product Basis Systems [2.4554686192257424]
We propose a framework for learning continuous representations from a sample of multidimensional functional data.
We show that the resulting estimation problem can be solved efficiently by the tensor decomposition.
We conclude with a real data application in neuroimaging.
arXiv Detail & Related papers (2021-07-30T16:02:15Z) - Using Data Assimilation to Train a Hybrid Forecast System that Combines
Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements.
We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.