A Review of Sparse Expert Models in Deep Learning
- URL: http://arxiv.org/abs/2209.01667v1
- Date: Sun, 4 Sep 2022 18:00:29 GMT
- Title: A Review of Sparse Expert Models in Deep Learning
- Authors: William Fedus, Jeff Dean, Barret Zoph
- Abstract summary: Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in deep learning.
We review the concept of sparse expert models, provide a basic description of the common algorithms, and contextualize the advances in the deep learning era.
- Score: 23.721204843236006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse expert models are a thirty-year old concept re-emerging as a popular
architecture in deep learning. This class of architecture encompasses
Mixture-of-Experts, Switch Transformers, Routing Networks, BASE layers, and
others, all with the unifying idea that each example is acted on by a subset of
the parameters. By doing so, the degree of sparsity decouples the parameter
count from the compute per example allowing for extremely large, but efficient
models. The resulting models have demonstrated significant improvements across
diverse domains such as natural language processing, computer vision, and
speech recognition. We review the concept of sparse expert models, provide a
basic description of the common algorithms, contextualize the advances in the
deep learning era, and conclude by highlighting areas for future work.
Related papers
- Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency.
We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures.
The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z) - Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting
Pre-trained Language Models [22.977852629450346]
We propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models.
In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture.
Our experiment results show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters.
arXiv Detail & Related papers (2023-10-24T23:29:06Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Model-Based Deep Learning: On the Intersection of Deep Learning and
Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications.
Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular.
Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z) - SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for
Unified Predictive Segments System [8.917697023052257]
We present SuperCone, our unified predicative segments system.
It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints.
It can outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks.
arXiv Detail & Related papers (2022-03-09T04:11:39Z) - Polynomial Networks in Deep Classifiers [55.90321402256631]
We cast the study of deep neural networks under a unifying framework.
Our framework provides insights on the inductive biases of each model.
The efficacy of the proposed models is evaluated on standard image and audio classification benchmarks.
arXiv Detail & Related papers (2021-04-16T06:41:20Z) - A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts.
In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images.
We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z) - More Is More -- Narrowing the Generalization Gap by Adding
Classification Heads [8.883733362171032]
We introduce an architecture enhancement for existing neural network models based on input transformations, termed 'TransNet'
Our model can be employed during training time only and then pruned for prediction, resulting in an equivalent architecture to the base model.
arXiv Detail & Related papers (2021-02-09T16:30:33Z) - Neural Entity Linking: A Survey of Models Based on Deep Learning [82.43751915717225]
This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015.
Its goal is to systemize design features of neural entity linking systems and compare their performance to the remarkable classic methods on common benchmarks.
The survey touches on applications of entity linking, focusing on the recently emerged use-case of enhancing deep pre-trained masked language models.
arXiv Detail & Related papers (2020-05-31T18:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.