Improvements to Supervised EM Learning of Shared Kernel Models by
Feature Space Partitioning
- URL: http://arxiv.org/abs/2205.15304v1
- Date: Tue, 31 May 2022 09:18:58 GMT
- Title: Improvements to Supervised EM Learning of Shared Kernel Models by
Feature Space Partitioning
- Authors: Graham W. Pulford
- Abstract summary: This paper addresses the lack of rigour in the derivation of the EM training algorithm and the computational complexity of the technique.
We first present a detailed derivation of EM for the Gaussian shared kernel model PRBF classifier.
To reduce complexity of the resulting SKEM algorithm, we partition the feature space into $R$ non-overlapping subsets of variables.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Expectation maximisation (EM) is usually thought of as an unsupervised
learning method for estimating the parameters of a mixture distribution,
however it can also be used for supervised learning when class labels are
available. As such, EM has been applied to train neural nets including the
probabilistic radial basis function (PRBF) network or shared kernel (SK) model.
This paper addresses two major shortcomings of previous work in this area: the
lack of rigour in the derivation of the EM training algorithm; and the
computational complexity of the technique, which has limited it to low
dimensional data sets. We first present a detailed derivation of EM for the
Gaussian shared kernel model PRBF classifier, making use of data association
theory to obtain the complete data likelihood, Baum's auxiliary function (the
E-step) and its subsequent maximisation (M-step). To reduce complexity of the
resulting SKEM algorithm, we partition the feature space into $R$
non-overlapping subsets of variables. The resulting product decomposition of
the joint data likelihood, which is exact when the feature partitions are
independent, allows the SKEM to be implemented in parallel and at $R^2$ times
lower complexity. The operation of the partitioned SKEM algorithm is
demonstrated on the MNIST data set and compared with its non-partitioned
counterpart. It eventuates that improved performance at reduced complexity is
achievable. Comparisons with standard classification algorithms are provided on
a number of other benchmark data sets.
Related papers
- Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Kernel Biclustering algorithm in Hilbert Spaces [8.303238963864885]
We develop a new model-free biclustering algorithm in abstract spaces using the notions of energy distance and the maximum mean discrepancy.
The proposed method can learn more general and complex cluster shapes than most existing literature approaches.
Our results are similar to state-of-the-art methods in their optimal scenarios, assuming a proper kernel choice.
arXiv Detail & Related papers (2022-08-07T08:41:46Z) - Learning Shared Kernel Models: the Shared Kernel EM algorithm [0.0]
Expectation maximisation (EM) is an unsupervised learning method for estimating the parameters of a finite mixture distribution.
We first present a rederivation of the standard EM algorithm using data association ideas from the field of multiple target tracking.
The same method is then applied to a little known but much more general type of supervised EM algorithm for shared kernel models.
arXiv Detail & Related papers (2022-05-15T10:10:08Z) - Dendritic Self-Organizing Maps for Continual Learning [0.0]
We propose a novel algorithm inspired by biological neurons, termed Dendritic-Self-Organizing Map (DendSOM)
DendSOM consists of a single layer of SOMs, which extract patterns from specific regions of the input space.
It outperforms classical SOMs and several state-of-the-art continual learning algorithms on benchmark datasets.
arXiv Detail & Related papers (2021-10-18T14:47:19Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Memory and Computation-Efficient Kernel SVM via Binary Embedding and
Ternary Model Coefficients [18.52747917850984]
Kernel approximation is widely used to scale up kernel SVM training and prediction.
Memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices.
We propose a novel memory and computation-efficient kernel SVM model by using both binary embedding and binary model coefficients.
arXiv Detail & Related papers (2020-10-06T09:41:54Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Efficient Evaluation of the Partition Function of RBMs with Annealed
Importance Sampling [0.30458514384586394]
Annealed Importance Sampling (AIS) method provides a tool to estimate the partition function of the system.
We analyze the performance of AIS in both small- and large-sized problems, and show that in both cases a good estimation of Z can be obtained with little computational cost.
arXiv Detail & Related papers (2020-07-23T10:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.