Related papers: Schema matching using Gaussian mixture models with Wasserstein distance

Schema matching using Gaussian mixture models with Wasserstein distance

URL: http://arxiv.org/abs/2111.14244v1
Date: Sun, 28 Nov 2021 21:44:58 GMT
Title: Schema matching using Gaussian mixture models with Wasserstein distance
Authors: Mateusz Przyborowski, Mateusz Pabi\'s, Andrzej Janusz, Dominik \'Sl\k{e}zak
Abstract summary: We derive approximations for the Wasserstein distance between Gaussian mixture models and reduce it to linear problem. In this paper we derive one of possible approximations for the Wasserstein distance between Gaussian mixture models and reduce it to linear problem.
Score: 0.2676349883103403
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gaussian mixture models find their place as a powerful tool, mostly in the clustering problem, but with proper preparation also in feature extraction, pattern recognition, image segmentation and in general machine learning. When faced with the problem of schema matching, different mixture models computed on different pieces of data can maintain crucial information about the structure of the dataset. In order to measure or compare results from mixture models, the Wasserstein distance can be very useful, however it is not easy to calculate for mixture distributions. In this paper we derive one of possible approximations for the Wasserstein distance between Gaussian mixture models and reduce it to linear problem. Furthermore, application examples concerning real world data are shown.

Related papers

Mixture models for data with unknown distributions [0.6345523830122168]
We describe and analyze a broad class of mixture models for real-valued multivariate data. We return both a division of the data and an estimate of the distributions, effectively performing clustering and density estimation within each cluster at the same time. We demonstrate our methods with a selection of illustrative applications and give code implementing both algorithms.
arXiv Detail & Related papers (2025-02-26T22:42:40Z)
GeoMix: Towards Geometry-Aware Data Augmentation [76.09914619612812]
Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification. We propose Geometric Mixup (GeoMix), a simple and interpretable Mixup approach leveraging in-place graph editing.
arXiv Detail & Related papers (2024-07-15T12:58:04Z)
Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest. Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z)
Clustering based on Mixtures of Sparse Gaussian Processes [6.939768185086753]
How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning. In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction. Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC)
arXiv Detail & Related papers (2023-03-23T20:44:36Z)
Learning Gaussian Mixtures Using the Wasserstein-Fisher-Rao Gradient Flow [12.455057637445174]
We propose a new algorithm to compute the nonparametric maximum likelihood estimator (NPMLE) in a Gaussian mixture model. Our method is based on gradient descent over the space of probability measures equipped with the Wasserstein-Fisher-Rao geometry. We conduct extensive numerical experiments to confirm the effectiveness of the proposed algorithm.
arXiv Detail & Related papers (2023-01-04T18:59:35Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data. A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z)
Evaluating State-of-the-Art Classification Models Against Bayes Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows. We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z)
Tensor decomposition for learning Gaussian mixtures from moments [6.576993289263191]
In data processing and machine learning, an important challenge is to recover and exploit models that can represent accurately the data. We investigate symmetric tensor decomposition methods for tackling this problem, where the tensor is built from empirical moments of the data distribution.
arXiv Detail & Related papers (2021-06-01T15:11:08Z)
A similarity-based Bayesian mixture-of-experts model [0.5156484100374058]
We present a new non-parametric mixture-of-experts model for multivariate regression problems. Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point. Posterior inference is performed on the parameters of the mixture as well as the distance metric.
arXiv Detail & Related papers (2020-12-03T18:08:30Z)
Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.