Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process
- URL: http://arxiv.org/abs/2402.04146v3
- Date: Mon, 15 Jul 2024 20:03:07 GMT
- Title: Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process
- Authors: Sandipp Krishnan Ravi, Yigitcan Comlek, Wei Chen, Arjun Pathak, Vipul Gupta, Rajnikant Umretiya, Andrew Hoffman, Ghanshyam Pilania, Piyush Pandita, Sayan Ghosh, Nathaniel Mckeever, Liping Wang,
- Abstract summary: The proposed approach is demonstrated on and analyzed through two mathematical (representative parabola problem, 2D Ackley function) and two materials science (design of FeCrAl and SmCoFe alloys) case studies.
It is observed that compared to using single-source and source unaware ML models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems, interpretability regarding the sources, and enhanced modeling capabilities by taking advantage of the correlations and relationships among different sources.
- Score: 8.207427766052044
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the advent of artificial intelligence (AI) and machine learning (ML), various domains of science and engineering communites has leveraged data-driven surrogates to model complex systems from numerous sources of information (data). The proliferation has led to significant reduction in cost and time involved in development of superior systems designed to perform specific functionalities. A high proposition of such surrogates are built extensively fusing multiple sources of data, may it be published papers, patents, open repositories, or other resources. However, not much attention has been paid to the differences in quality and comprehensiveness of the known and unknown underlying physical parameters of the information sources that could have downstream implications during system optimization. Towards resolving this issue, a multi-source data fusion framework based on Latent Variable Gaussian Process (LVGP) is proposed. The individual data sources are tagged as a characteristic categorical variable that are mapped into a physically interpretable latent space, allowing the development of source-aware data fusion modeling. Additionally, a dissimilarity metric based on the latent variables of LVGP is introduced to study and understand the differences in the sources of data. The proposed approach is demonstrated on and analyzed through two mathematical (representative parabola problem, 2D Ackley function) and two materials science (design of FeCrAl and SmCoFe alloys) case studies. From the case studies, it is observed that compared to using single-source and source unaware ML models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems, interpretability regarding the sources, and enhanced modeling capabilities by taking advantage of the correlations and relationships among different sources.
Related papers
- Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process [8.32027826756131]
The proposed framework is demonstrated and analyzed on three engineering case studies.
It provides improved predictive accuracy over a single source model and transformed but source unaware model.
arXiv Detail & Related papers (2024-07-15T22:27:04Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human
Generation [59.77275587857252]
A holistic human dataset inevitably has insufficient and low-resolution information on local parts.
We propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model.
arXiv Detail & Related papers (2023-09-25T17:58:46Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - Probabilistic Neural Data Fusion for Learning from an Arbitrary Number
of Multi-fidelity Data Sets [0.0]
In this paper, we employ neural networks (NNs) for data fusion in scenarios where data is very scarce.
We introduce a unique NN architecture that converts MF modeling into a nonlinear manifold learning problem.
Our approach provides a high predictive power while quantifying various sources uncertainties.
arXiv Detail & Related papers (2023-01-30T20:27:55Z) - Robust Direct Learning for Causal Data Fusion [14.462235940634969]
We provide a framework for integrating multi-source data that separates the treatment effect from other nuisance functions.
We also propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory.
arXiv Detail & Related papers (2022-11-01T03:33:22Z) - Data Fusion with Latent Map Gaussian Processes [0.0]
Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design.
We introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion.
arXiv Detail & Related papers (2021-12-04T00:54:19Z) - Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems.
It has several desirable properties for real world applications.
However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.