Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process
- URL: http://arxiv.org/abs/2402.04146v4
- Date: Sun, 17 Nov 2024 03:59:54 GMT
- Title: Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process
- Authors: Sandipp Krishnan Ravi, Yigitcan Comlek, Arjun Pathak, Vipul Gupta, Rajnikant Umretiya, Andrew Hoffman, Ghanshyam Pilania, Piyush Pandita, Sayan Ghosh, Nathaniel Mckeever, Wei Chen, Liping Wang,
- Abstract summary: The proposed approach is demonstrated on and analyzed through two mathematical and two materials science case studies.
It is observed that compared to using single-source and source unaware machine learning models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems.
- Score: 8.207427766052044
- License:
- Abstract: With the advent of artificial intelligence and machine learning, various domains of science and engineering communities have leveraged data-driven surrogates to model complex systems through fusing numerous sources of information (data) from published papers, patents, open repositories, or other resources. However, not much attention has been paid to the differences in quality and comprehensiveness of the known and unknown underlying physical parameters of the information sources, which could have downstream implications during system optimization. Additionally, existing methods cannot fuse multi-source data into a single predictive model. Towards resolving this issue, a multi-source data fusion framework based on Latent Variable Gaussian Process (LVGP) is proposed. The individual data sources are tagged as a characteristic categorical variable that are mapped into a physically interpretable latent space, allowing the development of source-aware data fusion modeling. Additionally, a dissimilarity metric based on the latent variables of LVGP is introduced to study and understand the differences in the sources of data. The proposed approach is demonstrated on and analyzed through two mathematical and two materials science case studies. From the case studies, it is observed that compared to using single-source and source unaware machine learning models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems.
Related papers
- Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process [8.32027826756131]
The proposed framework is demonstrated and analyzed on three engineering case studies.
It provides improved predictive accuracy over a single source model and transformed but source unaware model.
arXiv Detail & Related papers (2024-07-15T22:27:04Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human
Generation [59.77275587857252]
A holistic human dataset inevitably has insufficient and low-resolution information on local parts.
We propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model.
arXiv Detail & Related papers (2023-09-25T17:58:46Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - An Adaptive Kernel Approach to Federated Learning of Heterogeneous
Causal Effects [10.248235276871256]
We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources.
We introduce an adaptive transfer algorithm that learns the similarities among the data sources.
The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.
arXiv Detail & Related papers (2023-01-01T04:57:48Z) - Robust Direct Learning for Causal Data Fusion [14.462235940634969]
We provide a framework for integrating multi-source data that separates the treatment effect from other nuisance functions.
We also propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory.
arXiv Detail & Related papers (2022-11-01T03:33:22Z) - Data Fusion with Latent Map Gaussian Processes [0.0]
Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design.
We introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion.
arXiv Detail & Related papers (2021-12-04T00:54:19Z) - Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z) - Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems.
It has several desirable properties for real world applications.
However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.