Aggregated Multi-output Gaussian Processes with Knowledge Transfer
Across Domains
- URL: http://arxiv.org/abs/2206.12141v1
- Date: Fri, 24 Jun 2022 08:07:20 GMT
- Title: Aggregated Multi-output Gaussian Processes with Knowledge Transfer
Across Domains
- Authors: Yusuke Tanaka, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima,
Maya Okawa, Yasunori Akagi, Hiroyuki Toda
- Abstract summary: This article offers a multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities.
Experiments demonstrate that the proposed model outperforms in the task of refining coarse-grained aggregate data on real-world datasets.
- Score: 39.25639417233822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aggregate data often appear in various fields such as socio-economics and
public security. The aggregate data are associated not with points but with
supports (e.g., spatial regions in a city). Since the supports may have various
granularities depending on attributes (e.g., poverty rate and crime rate),
modeling such data is not straightforward. This article offers a multi-output
Gaussian process (MoGP) model that infers functions for attributes using
multiple aggregate datasets of respective granularities. In the proposed model,
the function for each attribute is assumed to be a dependent GP modeled as a
linear mixing of independent latent GPs. We design an observation model with an
aggregation process for each attribute; the process is an integral of the GP
over the corresponding support. We also introduce a prior distribution of the
mixing weights, which allows a knowledge transfer across domains (e.g., cities)
by sharing the prior. This is advantageous in such a situation where the
spatially aggregated dataset in a city is too coarse to interpolate; the
proposed model can still make accurate predictions of attributes by utilizing
aggregate datasets in other cities. The inference of the proposed model is
based on variational Bayes, which enables one to learn the model parameters
using the aggregate datasets from multiple domains. The experiments demonstrate
that the proposed model outperforms in the task of refining coarse-grained
aggregate data on real-world datasets: Time series of air pollutants in Beijing
and various kinds of spatial datasets from New York City and Chicago.
Related papers
- An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches.
Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture.
We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z) - Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces.
We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning.
The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - Cluster-Specific Predictions with Multi-Task Gaussian Processes [4.368185344922342]
A model involving Gaussian processes (GPs) is introduced to handle multi-task learning, clustering, and prediction.
The model is instantiated as a mixture of multi-task GPs with common mean processes.
The overall algorithm, called MagmaClust, is publicly available as an R package.
arXiv Detail & Related papers (2020-11-16T11:08:59Z) - Learn to Expect the Unexpected: Probably Approximately Correct Domain
Generalization [38.345670899258515]
Domain generalization is the problem of machine learning when the training data and the test data come from different data domains.
We present a simple theoretical model of learning to generalize across domains in which there is a meta-distribution over data distributions.
arXiv Detail & Related papers (2020-02-13T17:37:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.