Related papers: Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent

Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent

URL: http://arxiv.org/abs/2307.06753v1
Date: Thu, 13 Jul 2023 13:43:02 GMT
Title: Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent
Authors: Ruichong Zhang
Abstract summary: As of today, few known algorithms can fit or learn Gaussian mixture models. We propose a distance function called Sliced Cram'er 2-distance for learning general multivariate GMMs. These features are especially useful for distributional reinforcement learning and Deep Q Networks.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The learning of Gaussian Mixture Models (also referred to simply as GMMs) plays an important role in machine learning. Known for their expressiveness and interpretability, Gaussian mixture models have a wide range of applications, from statistics, computer vision to distributional reinforcement learning. However, as of today, few known algorithms can fit or learn these models, some of which include Expectation-Maximization algorithms and Sliced Wasserstein Distance. Even fewer algorithms are compatible with gradient descent, the common learning process for neural networks. In this paper, we derive a closed formula of two GMMs in the univariate, one-dimensional case, then propose a distance function called Sliced Cram\'er 2-distance for learning general multivariate GMMs. Our approach has several advantages over many previous methods. First, it has a closed-form expression for the univariate case and is easy to compute and implement using common machine learning libraries (e.g., PyTorch and TensorFlow). Second, it is compatible with gradient descent, which enables us to integrate GMMs with neural networks seamlessly. Third, it can fit a GMM not only to a set of data points, but also to another GMM directly, without sampling from the target model. And fourth, it has some theoretical guarantees like global gradient boundedness and unbiased sampling gradient. These features are especially useful for distributional reinforcement learning and Deep Q Networks, where the goal is to learn a distribution over future rewards. We will also construct a Gaussian Mixture Distributional Deep Q Network as a toy example to demonstrate its effectiveness. Compared with previous models, this model is parameter efficient in terms of representing a distribution and possesses better interpretability.

Related papers

Gradients of Functions of Large Matrices [18.361820028457718]
We show how to differentiate workhorses of numerical linear algebra efficiently. We derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax. All this is achieved without any problem-specific code optimisation.
arXiv Detail & Related papers (2024-05-27T15:39:45Z)
Deep Gaussian mixture model for unsupervised image segmentation [1.3654846342364308]
In many tasks sufficient pixel-level labels are very difficult to obtain. We propose a method which combines a Gaussian mixture model (GMM) with unsupervised deep learning techniques. We demonstrate the advantages of our method in various experiments on the example of infarct segmentation on multi-sequence MRI images.
arXiv Detail & Related papers (2024-04-18T15:20:59Z)
An Efficient 1 Iteration Learning Algorithm for Gaussian Mixture Model And Gaussian Mixture Embedding For Neural Network [2.261786383673667]
The new algorithm brings more robustness and simplicity than classic Expectation Maximization (EM) algorithm. It also improves the accuracy and only take 1 iteration for learning.
arXiv Detail & Related papers (2023-08-18T10:17:59Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
Privacy-Preserving Object Detection & Localization Using Distributed Machine Learning: A Case Study of Infant Eyeblink Conditioning [1.3022864665437273]
We explore scalable distributed-training versions of two algorithms commonly used in object detection. The application of both algorithms in the medical field is examined using a paradigm from the fields of psychology and neuroscience.
arXiv Detail & Related papers (2020-10-14T17:33:28Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
DeepGMR: Learning Latent Gaussian Mixture Models for Registration [113.74060941036664]
Point cloud registration is a fundamental problem in 3D computer vision, graphics and robotics. In this paper, we introduce Deep Gaussian Mixture Registration (DeepGMR), the first learning-based registration method. Our proposed method shows favorable performance when compared with state-of-the-art geometry-based and learning-based registration methods.
arXiv Detail & Related papers (2020-08-20T17:25:16Z)
GAT-GMM: Generative Adversarial Training for Gaussian Mixture Models [29.42264360774606]
Generative adversarial networks (GANs) learn the distribution of observed samples through a zero-sum game. We propose Gene Adversarial Gaussian Models (GAT-GMM), a minimax GANMs. We show that GAT-GMM can perform as well as the expectation-maximization algorithm in learning mixtures of two Gaussians.
arXiv Detail & Related papers (2020-06-18T06:11:28Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.