Mixture of ELM based experts with trainable gating network
- URL: http://arxiv.org/abs/2105.11706v1
- Date: Tue, 25 May 2021 07:13:35 GMT
- Title: Mixture of ELM based experts with trainable gating network
- Authors: Laleh Armi, Elham Abbasi, Jamal Zarepour-Ahmadabadi
- Abstract summary: We propose an ensemble learning method based on mixture of experts.
The structure of ME consists of multi layer perceptrons (MLPs) as base experts and gating network.
In the proposed method a trainable gating network is applied to aggregate the outputs of the experts.
- Score: 2.320417845168326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixture of experts method is a neural network based ensemble learning that
has great ability to improve the overall classification accuracy. This method
is based on the divide and conquer principle, in which the problem space is
divided between several experts by supervisition of gating network. In this
paper, we propose an ensemble learning method based on mixture of experts which
is named mixture of ELM based experts with trainable gating network (MEETG) to
improve the computing cost and to speed up the learning process of ME. The
structure of ME consists of multi layer perceptrons (MLPs) as base experts and
gating network, in which gradient-based learning algorithm is applied for
training the MLPs which is an iterative and time consuming process. In order to
overcome on these problems, we use the advantages of extreme learning machine
(ELM) for designing the structure of ME. ELM as a learning algorithm for single
hidden-layer feed forward neural networks provides much faster learning process
and better generalization ability in comparision with some other traditional
learning algorithms. Also, in the proposed method a trainable gating network is
applied to aggregate the outputs of the experts dynamically according to the
input sample. Our experimental results and statistical analysis on 11 benchmark
datasets confirm that MEETG has an acceptable performance in classification
problems. Furthermore, our experimental results show that the proposed approach
outperforms the original ELM on prediction stability and classification
accuracy.
Related papers
- Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks.
We transform deep net training into a linear empirical risk minimization problem.
We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - Fast Cerebral Blood Flow Analysis via Extreme Learning Machine [4.373558495838564]
We introduce a rapid and precise analytical approach for analyzing cerebral blood flow (CBF) using Diffuse Correlation spectroscopy (DCS)
We assess existing algorithms using synthetic datasets for both semi-infinite and multi-layer models.
Results demonstrate that ELM consistently achieves higher fidelity across various noise levels and optical parameters, showcasing robust generalization ability and outperforming iterative fitting algorithms.
arXiv Detail & Related papers (2024-01-10T23:01:35Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Towards Understanding Mixture of Experts in Deep Learning [95.27215939891511]
We study how the MoE layer improves the performance of neural network learning.
Our results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE.
arXiv Detail & Related papers (2022-08-04T17:59:10Z) - Accelerating Federated Edge Learning via Topology Optimization [41.830942005165625]
Federated edge learning (FEEL) is envisioned as a promising paradigm to achieve privacy-preserving distributed learning.
It consumes excessive learning time due to the existence of straggler devices.
A novel topology-optimized federated edge learning (TOFEL) scheme is proposed to tackle the heterogeneity issue in federated learning.
arXiv Detail & Related papers (2022-04-01T14:49:55Z) - Fast Deep Mixtures of Gaussian Process Experts [0.6554326244334868]
Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context.
In this article, we propose to design the gating network for selecting the experts from sparse GPs using a deep neural network (DNN)
A fast one pass algorithm called Cluster-Classify-Regress ( CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly.
arXiv Detail & Related papers (2020-06-11T18:52:34Z) - Hyperspectral Unmixing Network Inspired by Unfolding an Optimization
Problem [2.4016406737205753]
The hyperspectral image (HSI) unmixing task is essentially an inverse problem, which is commonly solved by optimization algorithms.
We propose two novel network architectures, named U-ADMM-AENet and U-ADMM-BUNet, for abundance estimation and blind unmixing.
We show that the unfolded structures can find corresponding interpretations in machine learning literature, which further demonstrates the effectiveness of proposed methods.
arXiv Detail & Related papers (2020-05-21T18:49:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.