Related papers: Diversified Dynamic Routing for Vision Tasks

Diversified Dynamic Routing for Vision Tasks

URL: http://arxiv.org/abs/2209.13071v1
Date: Mon, 26 Sep 2022 23:27:51 GMT
Title: Diversified Dynamic Routing for Vision Tasks
Authors: Botos Csaba, Adel Bibi, Yanwei Li, Philip Torr, Ser-Nam Lim
Abstract summary: We propose a novel architecture where each layer is composed of a set of experts. In our method, the model is explicitly trained to solve the challenge of finding relevant partitioning of the data. We conduct several experiments on semantic segmentation on Cityscapes and object detection and instance segmentation on MS-COCO.
Score: 36.199659460868496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning models for vision tasks are trained on large datasets under the assumption that there exists a universal representation that can be used to make predictions for all samples. Whereas high complexity models are proven to be capable of learning such representations, a mixture of experts trained on specific subsets of the data can infer the labels more efficiently. However using mixture of experts poses two new problems, namely (i) assigning the correct expert at inference time when a new unseen sample is presented. (ii) Finding the optimal partitioning of the training data, such that the experts rely the least on common features. In Dynamic Routing (DR) a novel architecture is proposed where each layer is composed of a set of experts, however without addressing the two challenges we demonstrate that the model reverts to using the same subset of experts. In our method, Diversified Dynamic Routing (DivDR) the model is explicitly trained to solve the challenge of finding relevant partitioning of the data and assigning the correct experts in an unsupervised approach. We conduct several experiments on semantic segmentation on Cityscapes and object detection and instance segmentation on MS-COCO showing improved performance over several baselines.

Related papers

SpectR: Dynamically Composing LM Experts with Spectral Routing [37.969478059005574]
This paper introduces SPECTR, an approach for dynamically composing expert models at each time step during inference. We show that SPECTR improves routing accuracy over alternative training-free methods, increasing task performance across expert domains.
arXiv Detail & Related papers (2025-04-04T13:58:44Z)
Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging [36.0133566024214]
Upcycling Instruction Tuning (UpIT) is a data-efficient approach for tuning a dense pre-trained model into a MoE instruction model. To ensure each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router.
arXiv Detail & Related papers (2024-10-02T14:48:22Z)
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [63.96018203905272]
We propose to reduce the sampling cost by pruning a pretrained diffusion model into a mixture of efficient experts. We demonstrate the effectiveness of our method, DiffPruning, across several datasets.
arXiv Detail & Related papers (2024-09-23T21:27:26Z)
RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models [58.987116118425995]
We introduce RouterRetriever, a retrieval model that leverages multiple domain-specific experts. It is lightweight and allows easy addition or removal of experts without additional training. It is the first work to demonstrate the advantages of using multiple domain-specific expert embedding models.
arXiv Detail & Related papers (2024-09-04T13:16:55Z)
Harder Tasks Need More Experts: Dynamic Routing in MoE Models [58.18526590138739]
We introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models. Our method dynamically selects experts based on the confidence level in expert selection for each input.
arXiv Detail & Related papers (2024-03-12T13:41:15Z)
One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets. We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z)
Pre-trained Recommender Systems: A Causal Debiasing Perspective [19.712997823535066]
We develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains. Our empirical studies show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings.
arXiv Detail & Related papers (2023-10-30T03:37:32Z)
Accelerating exploration and representation learning with offline pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z)
Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad') We optimize this matching process during the training of a single model. Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z)
On the Representation Collapse of Sparse Mixture of Experts [102.83396489230375]
Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse.
arXiv Detail & Related papers (2022-04-20T01:40:19Z)
SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for Unified Predictive Segments System [8.917697023052257]
We present SuperCone, our unified predicative segments system. It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints. It can outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks.
arXiv Detail & Related papers (2022-03-09T04:11:39Z)
D-LEMA: Deep Learning Ensembles from Multiple Annotations -- Application to Skin Lesion Segmentation [14.266037264648533]
Leveraging a collection of annotators' opinions for an image is an interesting way of estimating a gold standard. We propose an approach to handle annotators' disagreements when training a deep model.
arXiv Detail & Related papers (2020-12-14T01:51:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.