Related papers: Improvements to Supervised EM Learning of Shared Kernel Models by Feature Space Partitioning

Improvements to Supervised EM Learning of Shared Kernel Models by Feature Space Partitioning

URL: http://arxiv.org/abs/2205.15304v1
Date: Tue, 31 May 2022 09:18:58 GMT
Title: Improvements to Supervised EM Learning of Shared Kernel Models by Feature Space Partitioning
Authors: Graham W. Pulford
Abstract summary: This paper addresses the lack of rigour in the derivation of the EM training algorithm and the computational complexity of the technique. We first present a detailed derivation of EM for the Gaussian shared kernel model PRBF classifier. To reduce complexity of the resulting SKEM algorithm, we partition the feature space into $R$ non-overlapping subsets of variables.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Expectation maximisation (EM) is usually thought of as an unsupervised learning method for estimating the parameters of a mixture distribution, however it can also be used for supervised learning when class labels are available. As such, EM has been applied to train neural nets including the probabilistic radial basis function (PRBF) network or shared kernel (SK) model. This paper addresses two major shortcomings of previous work in this area: the lack of rigour in the derivation of the EM training algorithm; and the computational complexity of the technique, which has limited it to low dimensional data sets. We first present a detailed derivation of EM for the Gaussian shared kernel model PRBF classifier, making use of data association theory to obtain the complete data likelihood, Baum's auxiliary function (the E-step) and its subsequent maximisation (M-step). To reduce complexity of the resulting SKEM algorithm, we partition the feature space into $R$ non-overlapping subsets of variables. The resulting product decomposition of the joint data likelihood, which is exact when the feature partitions are independent, allows the SKEM to be implemented in parallel and at $R^2$ times lower complexity. The operation of the partitioned SKEM algorithm is demonstrated on the MNIST data set and compared with its non-partitioned counterpart. It eventuates that improved performance at reduced complexity is achievable. Comparisons with standard classification algorithms are provided on a number of other benchmark data sets.

Related papers

Joint Learning of Energy-based Models and their Partition Function [19.174145933837927]
Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks. We propose a novel formulation for approximately learning EBMs inly-large discrete spaces. We show that our approach naturally extends to the broader family of Fenchel-Young losses.
arXiv Detail & Related papers (2025-01-30T17:46:17Z)
Learning Mixtures of Experts with EM [28.48469221248906]
Mixtures of Experts (MoE) are Machine Learning models that involve the input space, with a separate "expert" model trained on each partition. We study the efficiency of the Expectation Maximization (EM) algorithm for the training of MoE models.
arXiv Detail & Related papers (2024-11-09T03:44:09Z)
Network EM Algorithm for Gaussian Mixture Model in Decentralized Federated Learning [1.4549461207028445]
We study various network Expectation-Maximization (EM) algorithms for the Gaussian mixture model. We introduce a momentum network EM (MNEM) algorithm, which uses a momentum parameter to combine information from both the current and historical estimators. We also develop a semi-supervised MNEM algorithm, which leverages partially labeled data.
arXiv Detail & Related papers (2024-11-08T14:25:46Z)
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs) We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Learning Shared Kernel Models: the Shared Kernel EM algorithm [0.0]
Expectation maximisation (EM) is an unsupervised learning method for estimating the parameters of a finite mixture distribution. We first present a rederivation of the standard EM algorithm using data association ideas from the field of multiple target tracking. The same method is then applied to a little known but much more general type of supervised EM algorithm for shared kernel models.
arXiv Detail & Related papers (2022-05-15T10:10:08Z)
Dendritic Self-Organizing Maps for Continual Learning [0.0]
We propose a novel algorithm inspired by biological neurons, termed Dendritic-Self-Organizing Map (DendSOM) DendSOM consists of a single layer of SOMs, which extract patterns from specific regions of the input space. It outperforms classical SOMs and several state-of-the-art continual learning algorithms on benchmark datasets.
arXiv Detail & Related papers (2021-10-18T14:47:19Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Efficient semidefinite-programming-based inference for binary and multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF. We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z)
Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients [18.52747917850984]
Kernel approximation is widely used to scale up kernel SVM training and prediction. Memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices. We propose a novel memory and computation-efficient kernel SVM model by using both binary embedding and binary model coefficients.
arXiv Detail & Related papers (2020-10-06T09:41:54Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
Kernel learning approaches for summarising and combining posterior similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.