Related papers: Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process

Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process

URL: http://arxiv.org/abs/2402.04146v3
Date: Mon, 15 Jul 2024 20:03:07 GMT
Title: Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process
Authors: Sandipp Krishnan Ravi, Yigitcan Comlek, Wei Chen, Arjun Pathak, Vipul Gupta, Rajnikant Umretiya, Andrew Hoffman, Ghanshyam Pilania, Piyush Pandita, Sayan Ghosh, Nathaniel Mckeever, Liping Wang,
Abstract summary: The proposed approach is demonstrated on and analyzed through two mathematical (representative parabola problem, 2D Ackley function) and two materials science (design of FeCrAl and SmCoFe alloys) case studies. It is observed that compared to using single-source and source unaware ML models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems, interpretability regarding the sources, and enhanced modeling capabilities by taking advantage of the correlations and relationships among different sources.
Score: 8.207427766052044
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the advent of artificial intelligence (AI) and machine learning (ML), various domains of science and engineering communites has leveraged data-driven surrogates to model complex systems from numerous sources of information (data). The proliferation has led to significant reduction in cost and time involved in development of superior systems designed to perform specific functionalities. A high proposition of such surrogates are built extensively fusing multiple sources of data, may it be published papers, patents, open repositories, or other resources. However, not much attention has been paid to the differences in quality and comprehensiveness of the known and unknown underlying physical parameters of the information sources that could have downstream implications during system optimization. Towards resolving this issue, a multi-source data fusion framework based on Latent Variable Gaussian Process (LVGP) is proposed. The individual data sources are tagged as a characteristic categorical variable that are mapped into a physically interpretable latent space, allowing the development of source-aware data fusion modeling. Additionally, a dissimilarity metric based on the latent variables of LVGP is introduced to study and understand the differences in the sources of data. The proposed approach is demonstrated on and analyzed through two mathematical (representative parabola problem, 2D Ackley function) and two materials science (design of FeCrAl and SmCoFe alloys) case studies. From the case studies, it is observed that compared to using single-source and source unaware ML models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems, interpretability regarding the sources, and enhanced modeling capabilities by taking advantage of the correlations and relationships among different sources.

Related papers

Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process [8.32027826756131]
The proposed framework is demonstrated and analyzed on three engineering case studies. It provides improved predictive accuracy over a single source model and transformed but source unaware model.
arXiv Detail & Related papers (2024-07-15T22:27:04Z)
Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model. DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z)
UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation [59.77275587857252]
A holistic human dataset inevitably has insufficient and low-resolution information on local parts. We propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model.
arXiv Detail & Related papers (2023-09-25T17:58:46Z)
Source-Free Collaborative Domain Adaptation via Multi-Perspective Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis. Many methods have been proposed to reduce fMRI heterogeneity between source and target domains. But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies. We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z)
An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects [10.248235276871256]
We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources. We introduce an adaptive transfer algorithm that learns the similarities among the data sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.
arXiv Detail & Related papers (2023-01-01T04:57:48Z)
Robust Direct Learning for Causal Data Fusion [14.462235940634969]
We provide a framework for integrating multi-source data that separates the treatment effect from other nuisance functions. We also propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory.
arXiv Detail & Related papers (2022-11-01T03:33:22Z)
Data Fusion with Latent Map Gaussian Processes [0.0]
Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design. We introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion.
arXiv Detail & Related papers (2021-12-04T00:54:19Z)
Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching. AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage. Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z)
Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications. However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z)
Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models. High-dimensionality and non-linear issues are traditionally handled by kernel methods. We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.