Related papers: Learning Shared Kernel Models: the Shared Kernel EM algorithm

Learning Shared Kernel Models: the Shared Kernel EM algorithm

URL: http://arxiv.org/abs/2205.09041v1
Date: Sun, 15 May 2022 10:10:08 GMT
Title: Learning Shared Kernel Models: the Shared Kernel EM algorithm
Authors: Graham W. Pulford
Abstract summary: Expectation maximisation (EM) is an unsupervised learning method for estimating the parameters of a finite mixture distribution. We first present a rederivation of the standard EM algorithm using data association ideas from the field of multiple target tracking. The same method is then applied to a little known but much more general type of supervised EM algorithm for shared kernel models.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Expectation maximisation (EM) is an unsupervised learning method for estimating the parameters of a finite mixture distribution. It works by introducing "hidden" or "latent" variables via Baum's auxiliary function $Q$ that allow the joint data likelihood to be expressed as a product of simple factors. The relevance of EM has increased since the introduction of the variational lower bound (VLB): the VLB differs from Baum's auxiliary function only by the entropy of the PDF of the latent variables $Z$. We first present a rederivation of the standard EM algorithm using data association ideas from the field of multiple target tracking, using $K$-valued scalar data association hypotheses rather than the usual binary indicator vectors. The same method is then applied to a little known but much more general type of supervised EM algorithm for shared kernel models, related to probabilistic radial basis function networks. We address a number of shortcomings in the derivations that have been published previously in this area. In particular, we give theoretically rigorous derivations of (i) the complete data likelihood; (ii) Baum's auxiliary function (the E-step) and (iii) the maximisation (M-step) in the case of Gaussian shared kernel models. The subsequent algorithm, called shared kernel EM (SKEM), is then applied to a digit recognition problem using a novel 7-segment digit representation. Variants of the algorithm that use different numbers of features and different EM algorithm dimensions are compared in terms of mean accuracy and mean IoU. A simplified classifier is proposed that decomposes the joint data PDF as a product of lower order PDFs over non-overlapping subsets of variables. The effect of different numbers of assumed mixture components $K$ is also investigated. High-level source code for the data generation and SKEM algorithm is provided.

Related papers

Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures [53.51230405648361]
We study the dynamics of gradient EM and employ tensor decomposition to characterize the geometric landscape of the likelihood loss.<n>This is the first global convergence and recovery result for EM or gradient EM beyond the special case of $m=2$.
arXiv Detail & Related papers (2025-06-06T23:32:38Z)
Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance. We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks. We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z)
Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry [63.694184882697435]
Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations. This paper provides a comprehensive and unified understanding of the matrix logarithm and power from a Riemannian geometry perspective.
arXiv Detail & Related papers (2024-07-15T07:11:44Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Learning High-Dimensional Differential Graphs From Multi-Attribute Data [12.94486861344922]
We consider the problem of estimating differences in two Gaussian graphical models (GGMs) which are known to have similar structure. Existing methods for differential graph estimation are based on single-attribute (SA) models. In this paper, we analyze a group lasso penalized D-trace loss function approach for differential graph learning from multi-attribute data.
arXiv Detail & Related papers (2023-12-05T18:54:46Z)
Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent [0.0]
As of today, few known algorithms can fit or learn Gaussian mixture models. We propose a distance function called Sliced Cram'er 2-distance for learning general multivariate GMMs. These features are especially useful for distributional reinforcement learning and Deep Q Networks.
arXiv Detail & Related papers (2023-07-13T13:43:02Z)
Low-complexity subspace-descent over symmetric positive definite manifold [9.346050098365648]
We develop low-complexity algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold. The proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix.
arXiv Detail & Related papers (2023-05-03T11:11:46Z)
Improvements to Supervised EM Learning of Shared Kernel Models by Feature Space Partitioning [0.0]
This paper addresses the lack of rigour in the derivation of the EM training algorithm and the computational complexity of the technique. We first present a detailed derivation of EM for the Gaussian shared kernel model PRBF classifier. To reduce complexity of the resulting SKEM algorithm, we partition the feature space into $R$ non-overlapping subsets of variables.
arXiv Detail & Related papers (2022-05-31T09:18:58Z)
Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise. This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
From the Expectation Maximisation Algorithm to Autoencoded Variational Bayes [0.0]
We first give a tutorial presentation of the EM algorithm for estimating the parameters of a $K$-component mixture density. In a similar style to Bishop's 2009 book, we present variational Bayesian inference as a generalised EM algorithm. We establish clear links between the EM algorithm and its variational counterparts, hence clarifying the meaning of "latent variables"
arXiv Detail & Related papers (2020-10-23T17:23:26Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
Kernel learning approaches for summarising and combining posterior similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.