Learning Shared Kernel Models: the Shared Kernel EM algorithm
- URL: http://arxiv.org/abs/2205.09041v1
- Date: Sun, 15 May 2022 10:10:08 GMT
- Title: Learning Shared Kernel Models: the Shared Kernel EM algorithm
- Authors: Graham W. Pulford
- Abstract summary: Expectation maximisation (EM) is an unsupervised learning method for estimating the parameters of a finite mixture distribution.
We first present a rederivation of the standard EM algorithm using data association ideas from the field of multiple target tracking.
The same method is then applied to a little known but much more general type of supervised EM algorithm for shared kernel models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Expectation maximisation (EM) is an unsupervised learning method for
estimating the parameters of a finite mixture distribution. It works by
introducing "hidden" or "latent" variables via Baum's auxiliary function $Q$
that allow the joint data likelihood to be expressed as a product of simple
factors. The relevance of EM has increased since the introduction of the
variational lower bound (VLB): the VLB differs from Baum's auxiliary function
only by the entropy of the PDF of the latent variables $Z$. We first present a
rederivation of the standard EM algorithm using data association ideas from the
field of multiple target tracking, using $K$-valued scalar data association
hypotheses rather than the usual binary indicator vectors. The same method is
then applied to a little known but much more general type of supervised EM
algorithm for shared kernel models, related to probabilistic radial basis
function networks. We address a number of shortcomings in the derivations that
have been published previously in this area. In particular, we give
theoretically rigorous derivations of (i) the complete data likelihood; (ii)
Baum's auxiliary function (the E-step) and (iii) the maximisation (M-step) in
the case of Gaussian shared kernel models. The subsequent algorithm, called
shared kernel EM (SKEM), is then applied to a digit recognition problem using a
novel 7-segment digit representation. Variants of the algorithm that use
different numbers of features and different EM algorithm dimensions are
compared in terms of mean accuracy and mean IoU. A simplified classifier is
proposed that decomposes the joint data PDF as a product of lower order PDFs
over non-overlapping subsets of variables. The effect of different numbers of
assumed mixture components $K$ is also investigated. High-level source code for
the data generation and SKEM algorithm is provided.
Related papers
- Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Learning High-Dimensional Differential Graphs From Multi-Attribute Data [12.94486861344922]
We consider the problem of estimating differences in two Gaussian graphical models (GGMs) which are known to have similar structure.
Existing methods for differential graph estimation are based on single-attribute (SA) models.
In this paper, we analyze a group lasso penalized D-trace loss function approach for differential graph learning from multi-attribute data.
arXiv Detail & Related papers (2023-12-05T18:54:46Z) - Cramer Type Distances for Learning Gaussian Mixture Models by Gradient
Descent [0.0]
As of today, few known algorithms can fit or learn Gaussian mixture models.
We propose a distance function called Sliced Cram'er 2-distance for learning general multivariate GMMs.
These features are especially useful for distributional reinforcement learning and Deep Q Networks.
arXiv Detail & Related papers (2023-07-13T13:43:02Z) - Low-complexity subspace-descent over symmetric positive definite
manifold [9.346050098365648]
We develop low-complexity algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold.
The proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix.
arXiv Detail & Related papers (2023-05-03T11:11:46Z) - Improvements to Supervised EM Learning of Shared Kernel Models by
Feature Space Partitioning [0.0]
This paper addresses the lack of rigour in the derivation of the EM training algorithm and the computational complexity of the technique.
We first present a detailed derivation of EM for the Gaussian shared kernel model PRBF classifier.
To reduce complexity of the resulting SKEM algorithm, we partition the feature space into $R$ non-overlapping subsets of variables.
arXiv Detail & Related papers (2022-05-31T09:18:58Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - From the Expectation Maximisation Algorithm to Autoencoded Variational
Bayes [0.0]
We first give a tutorial presentation of the EM algorithm for estimating the parameters of a $K$-component mixture density.
In a similar style to Bishop's 2009 book, we present variational Bayesian inference as a generalised EM algorithm.
We establish clear links between the EM algorithm and its variational counterparts, hence clarifying the meaning of "latent variables"
arXiv Detail & Related papers (2020-10-23T17:23:26Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.