Diversified Dynamic Routing for Vision Tasks
- URL: http://arxiv.org/abs/2209.13071v1
- Date: Mon, 26 Sep 2022 23:27:51 GMT
- Title: Diversified Dynamic Routing for Vision Tasks
- Authors: Botos Csaba, Adel Bibi, Yanwei Li, Philip Torr, Ser-Nam Lim
- Abstract summary: We propose a novel architecture where each layer is composed of a set of experts.
In our method, the model is explicitly trained to solve the challenge of finding relevant partitioning of the data.
We conduct several experiments on semantic segmentation on Cityscapes and object detection and instance segmentation on MS-COCO.
- Score: 36.199659460868496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models for vision tasks are trained on large datasets under the
assumption that there exists a universal representation that can be used to
make predictions for all samples. Whereas high complexity models are proven to
be capable of learning such representations, a mixture of experts trained on
specific subsets of the data can infer the labels more efficiently. However
using mixture of experts poses two new problems, namely (i) assigning the
correct expert at inference time when a new unseen sample is presented. (ii)
Finding the optimal partitioning of the training data, such that the experts
rely the least on common features. In Dynamic Routing (DR) a novel architecture
is proposed where each layer is composed of a set of experts, however without
addressing the two challenges we demonstrate that the model reverts to using
the same subset of experts.
In our method, Diversified Dynamic Routing (DivDR) the model is explicitly
trained to solve the challenge of finding relevant partitioning of the data and
assigning the correct experts in an unsupervised approach. We conduct several
experiments on semantic segmentation on Cityscapes and object detection and
instance segmentation on MS-COCO showing improved performance over several
baselines.
Related papers
- Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging [36.0133566024214]
Upcycling Instruction Tuning (UpIT) is a data-efficient approach for tuning a dense pre-trained model into a MoE instruction model.
To ensure each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router.
arXiv Detail & Related papers (2024-10-02T14:48:22Z) - Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [63.96018203905272]
We propose to reduce the sampling cost by pruning a pretrained diffusion model into a mixture of efficient experts.
We demonstrate the effectiveness of our method, DiffPruning, across several datasets.
arXiv Detail & Related papers (2024-09-23T21:27:26Z) - RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models [58.987116118425995]
We introduce RouterRetriever, a retrieval model that leverages multiple domain-specific experts.
It is lightweight and allows easy addition or removal of experts without additional training.
It is the first work to demonstrate the advantages of using multiple domain-specific expert embedding models.
arXiv Detail & Related papers (2024-09-04T13:16:55Z) - Harder Tasks Need More Experts: Dynamic Routing in MoE Models [58.18526590138739]
We introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models.
Our method dynamically selects experts based on the confidence level in expert selection for each input.
arXiv Detail & Related papers (2024-03-12T13:41:15Z) - Pre-trained Recommender Systems: A Causal Debiasing Perspective [19.712997823535066]
We develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains.
Our empirical studies show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings.
arXiv Detail & Related papers (2023-10-30T03:37:32Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z) - On the Representation Collapse of Sparse Mixture of Experts [102.83396489230375]
Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead.
It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations.
However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse.
arXiv Detail & Related papers (2022-04-20T01:40:19Z) - SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for
Unified Predictive Segments System [8.917697023052257]
We present SuperCone, our unified predicative segments system.
It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints.
It can outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks.
arXiv Detail & Related papers (2022-03-09T04:11:39Z) - D-LEMA: Deep Learning Ensembles from Multiple Annotations -- Application
to Skin Lesion Segmentation [14.266037264648533]
Leveraging a collection of annotators' opinions for an image is an interesting way of estimating a gold standard.
We propose an approach to handle annotators' disagreements when training a deep model.
arXiv Detail & Related papers (2020-12-14T01:51:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.