Lightweight Data Fusion with Conjugate Mappings
- URL: http://arxiv.org/abs/2011.10607v1
- Date: Fri, 20 Nov 2020 19:47:13 GMT
- Title: Lightweight Data Fusion with Conjugate Mappings
- Authors: Christopher L. Dean, Stephen J. Lee, Jason Pacheco, John W. Fisher III
- Abstract summary: We present an approach to data fusion that combines the interpretability of structured probabilistic graphical models with the flexibility of neural networks.
The proposed method, lightweight data fusion (LDF), emphasizes posterior analysis over latent variables using two types of information.
- Score: 11.760099863897835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an approach to data fusion that combines the interpretability of
structured probabilistic graphical models with the flexibility of neural
networks. The proposed method, lightweight data fusion (LDF), emphasizes
posterior analysis over latent variables using two types of information:
primary data, which are well-characterized but with limited availability, and
auxiliary data, readily available but lacking a well-characterized statistical
relationship to the latent quantity of interest. The lack of a forward model
for the auxiliary data precludes the use of standard data fusion approaches,
while the inability to acquire latent variable observations severely limits
direct application of most supervised learning methods. LDF addresses these
issues by utilizing neural networks as conjugate mappings of the auxiliary
data: nonlinear transformations into sufficient statistics with respect to the
latent variables. This facilitates efficient inference by preserving the
conjugacy properties of the primary data and leads to compact representations
of the latent variable posterior distributions. We demonstrate the LDF
methodology on two challenging inference problems: (1) learning electrification
rates in Rwanda from satellite imagery, high-level grid infrastructure, and
other sources; and (2) inferring county-level homicide rates in the USA by
integrating socio-economic data using a mixture model of multiple conjugate
mappings.
Related papers
- Enhancing Information Maximization with Distance-Aware Contrastive
Learning for Source-Free Cross-Domain Few-Shot Learning [55.715623885418815]
Cross-Domain Few-Shot Learning methods require access to source domain data to train a model in the pre-training phase.
Due to increasing concerns about data privacy and the desire to reduce data transmission and training costs, it is necessary to develop a CDFSL solution without accessing source data.
This paper proposes an Enhanced Information Maximization with Distance-Aware Contrastive Learning method to address these challenges.
arXiv Detail & Related papers (2024-03-04T12:10:24Z) - Joint Distributional Learning via Cramer-Wold Distance [0.7614628596146602]
We introduce the Cramer-Wold distance regularization, which can be computed in a closed-form, to facilitate joint distributional learning for high-dimensional datasets.
We also introduce a two-step learning method to enable flexible prior modeling and improve the alignment between the aggregated posterior and the prior distribution.
arXiv Detail & Related papers (2023-10-25T05:24:23Z) - FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms
for Federated Learning [1.4656078321003647]
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately.
We study the currently popular data partitioning techniques and visualize their main disadvantages.
We propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions.
arXiv Detail & Related papers (2023-10-11T18:39:08Z) - Robust Direct Learning for Causal Data Fusion [14.462235940634969]
We provide a framework for integrating multi-source data that separates the treatment effect from other nuisance functions.
We also propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory.
arXiv Detail & Related papers (2022-11-01T03:33:22Z) - Inducing Data Amplification Using Auxiliary Datasets in Adversarial
Training [7.513100214864646]
We propose a biased multi-domain adversarial training (BiaMAT) method that induces training data amplification on a primary dataset.
The proposed method can achieve increased adversarial robustness on a primary dataset by leveraging auxiliary datasets.
arXiv Detail & Related papers (2022-09-27T09:21:40Z) - Data Fusion with Latent Map Gaussian Processes [0.0]
Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design.
We introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion.
arXiv Detail & Related papers (2021-12-04T00:54:19Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.