Diversified Dynamic Routing for Vision Tasks
- URL: http://arxiv.org/abs/2209.13071v1
- Date: Mon, 26 Sep 2022 23:27:51 GMT
- Title: Diversified Dynamic Routing for Vision Tasks
- Authors: Botos Csaba, Adel Bibi, Yanwei Li, Philip Torr, Ser-Nam Lim
- Abstract summary: We propose a novel architecture where each layer is composed of a set of experts.
In our method, the model is explicitly trained to solve the challenge of finding relevant partitioning of the data.
We conduct several experiments on semantic segmentation on Cityscapes and object detection and instance segmentation on MS-COCO.
- Score: 36.199659460868496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models for vision tasks are trained on large datasets under the
assumption that there exists a universal representation that can be used to
make predictions for all samples. Whereas high complexity models are proven to
be capable of learning such representations, a mixture of experts trained on
specific subsets of the data can infer the labels more efficiently. However
using mixture of experts poses two new problems, namely (i) assigning the
correct expert at inference time when a new unseen sample is presented. (ii)
Finding the optimal partitioning of the training data, such that the experts
rely the least on common features. In Dynamic Routing (DR) a novel architecture
is proposed where each layer is composed of a set of experts, however without
addressing the two challenges we demonstrate that the model reverts to using
the same subset of experts.
In our method, Diversified Dynamic Routing (DivDR) the model is explicitly
trained to solve the challenge of finding relevant partitioning of the data and
assigning the correct experts in an unsupervised approach. We conduct several
experiments on semantic segmentation on Cityscapes and object detection and
instance segmentation on MS-COCO showing improved performance over several
baselines.
Related papers
- Harder Tasks Need More Experts: Dynamic Routing in MoE Models [58.18526590138739]
We introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models.
Our method dynamically selects experts based on the confidence level in expert selection for each input.
arXiv Detail & Related papers (2024-03-12T13:41:15Z) - Pre-trained Recommender Systems: A Causal Debiasing Perspective [19.712997823535066]
We develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains.
Our empirical studies show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings.
arXiv Detail & Related papers (2023-10-30T03:37:32Z) - Fusing Models with Complementary Expertise [42.099743709292866]
We consider the Fusion of Experts (FoE) problem of fusing outputs of expert models with complementary knowledge of the data distribution.
Our method is applicable to both discriminative and generative tasks.
We extend our method to the "frugal" setting where it is desired to reduce the number of expert model evaluations at test time.
arXiv Detail & Related papers (2023-10-02T18:31:35Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z) - On the Representation Collapse of Sparse Mixture of Experts [102.83396489230375]
Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead.
It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations.
However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse.
arXiv Detail & Related papers (2022-04-20T01:40:19Z) - SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for
Unified Predictive Segments System [8.917697023052257]
We present SuperCone, our unified predicative segments system.
It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints.
It can outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks.
arXiv Detail & Related papers (2022-03-09T04:11:39Z) - D-LEMA: Deep Learning Ensembles from Multiple Annotations -- Application
to Skin Lesion Segmentation [14.266037264648533]
Leveraging a collection of annotators' opinions for an image is an interesting way of estimating a gold standard.
We propose an approach to handle annotators' disagreements when training a deep model.
arXiv Detail & Related papers (2020-12-14T01:51:22Z) - Multiple Expert Brainstorming for Domain Adaptive Person
Re-identification [140.3998019639158]
We propose a multiple expert brainstorming network (MEB-Net) for domain adaptive person re-ID.
MEB-Net adopts a mutual learning strategy, where multiple networks with different architectures are pre-trained within a source domain.
Experiments on large-scale datasets demonstrate the superior performance of MEB-Net over the state-of-the-arts.
arXiv Detail & Related papers (2020-07-03T08:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.