RouterRetriever: Routing over a Mixture of Expert Embedding Models
- URL: http://arxiv.org/abs/2409.02685v2
- Date: Wed, 26 Feb 2025 06:19:05 GMT
- Title: RouterRetriever: Routing over a Mixture of Expert Embedding Models
- Authors: Hyunji Lee, Luca Soldaini, Arman Cohan, Minjoon Seo, Kyle Lo,
- Abstract summary: We introduce RouterRetriever, a retrieval model that leverages a mixture of domain-specific experts by using a routing mechanism.<n> RouterRetriever is the first work to demonstrate the advantages of routing over a mixture of domain-specific expert embedding models.
- Score: 58.987116118425995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information retrieval methods often rely on a single embedding model trained on large, general-domain datasets like MSMARCO. While this approach can produce a retriever with reasonable overall performance, they often underperform models trained on domain-specific data when testing on their respective domains. Prior work in information retrieval has tackled this through multi-task training, but the idea of routing over a mixture of domain-specific expert retrievers remains unexplored despite the popularity of such ideas in language model generation research. In this work, we introduce RouterRetriever, a retrieval model that leverages a mixture of domain-specific experts by using a routing mechanism to select the most appropriate expert for each query. RouterRetriever is lightweight and allows easy addition or removal of experts without additional training. Evaluation on the BEIR benchmark demonstrates that RouterRetriever outperforms both models trained on MSMARCO (+2.1 absolute nDCG@10) and multi-task models (+3.2). This is achieved by employing our routing mechanism, which surpasses other routing techniques (+1.8 on average) commonly used in language modeling. Furthermore, the benefit generalizes well to other datasets, even in the absence of a specific expert on the dataset. RouterRetriever is the first work to demonstrate the advantages of routing over a mixture of domain-specific expert embedding models as an alternative to a single, general-purpose embedding model, especially when retrieving from diverse, specialized domains.
Related papers
- Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization [51.562474873972086]
Federated domain generalization (FedDG) aims to learn a globally generalizable model from decentralized clients with heterogeneous data.
Recent studies have introduced prompt learning to adapt vision-language models (VLMs) in FedDG by learning a single global prompt.
We propose TRIP, a Token-level prompt mixture with parameter-free routing framework for FedDG.
arXiv Detail & Related papers (2025-04-29T11:06:03Z) - Glider: Global and Local Instruction-Driven Expert Router [83.785832410832]
"Model MoErging" methods prioritize generalization to unseen tasks at the expense of performance on held-in tasks.
We propose Global and Local Instruction Driven Expert Router (GLIDER) that integrates a multi-scale routing mechanism.
GLIDER achieves substantially improved held-in performance while maintaining strong generalization on held-out tasks.
arXiv Detail & Related papers (2024-10-09T17:59:14Z) - Exploring Domain Robust Lightweight Reward Models based on Router Mechanism [1.3624495460189863]
We explore the utilization of small language models operating in a domain-specific manner based on router mechanisms.
Our three approaches are: 1) utilize mixture of experts to form a single reward model by modularizing an internal router and experts, 2) employing external router to select the appropriate reward model from multiple domain-specific models, and 3) the framework reduces parameter size by loading reward models and router adapters onto a single small language model using adapters.
arXiv Detail & Related papers (2024-07-24T17:25:12Z) - Deep Domain Specialisation for single-model multi-domain learning to rank [1.534667887016089]
Training multiple models comes at a higher cost to train, maintain and update compared to having only a single model responsible for all domains.
We propose a novel architecture of Deep Domain Specialisation (DDS) to consolidate multiple domains into a single model.
arXiv Detail & Related papers (2024-07-01T08:19:19Z) - GM-DF: Generalized Multi-Scenario Deepfake Detection [49.072106087564144]
Existing face forgery detection usually follows the paradigm of training models in a single domain.
In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets.
arXiv Detail & Related papers (2024-06-28T17:42:08Z) - Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training [73.90260246781435]
We present Lory, the first approach that scales such architectures to autoregressive language model pre-training.
We show significant performance gains over parameter-matched dense models on both perplexity and a variety of downstream tasks.
Despite segment-level routing, Lory models achieve competitive performance compared to state-of-the-art MoE models with token-level routing.
arXiv Detail & Related papers (2024-05-06T03:06:33Z) - Learning to Route Among Specialized Experts for Zero-Shot Generalization [39.56470042680907]
We propose Post-Hoc Adaptive Tokenwise Gating Over an Ocean of Specialized Experts (PHATGOOSE)
It learns to route among specialized modules that were produced through parameter-efficient fine-tuning.
It does not require simultaneous access to the datasets used to create the specialized models and only requires a modest amount of additional compute after each expert model is trained.
arXiv Detail & Related papers (2024-02-08T17:43:22Z) - Soft Merging of Experts with Adaptive Routing [38.962451264172856]
We introduce Soft Merging of Experts with Adaptive Routing (SMEAR)
SMEAR avoids discrete routing by using a single "merged" expert constructed via a weighted average of all of the experts' parameters.
We empirically validate that models using SMEAR outperform models that route based on metadata or learn sparse routing through gradient estimation.
arXiv Detail & Related papers (2023-06-06T15:04:31Z) - Chain-of-Skills: A Configurable Model for Open-domain Question Answering [79.8644260578301]
The retrieval model is an indispensable component for real-world knowledge-intensive tasks.
Recent work focuses on customized methods, limiting the model transferability and scalability.
We propose a modular retriever where individual modules correspond to key skills that can be reused across datasets.
arXiv Detail & Related papers (2023-05-04T20:19:39Z) - Diversified Dynamic Routing for Vision Tasks [36.199659460868496]
We propose a novel architecture where each layer is composed of a set of experts.
In our method, the model is explicitly trained to solve the challenge of finding relevant partitioning of the data.
We conduct several experiments on semantic segmentation on Cityscapes and object detection and instance segmentation on MS-COCO.
arXiv Detail & Related papers (2022-09-26T23:27:51Z) - Adaptive Network Combination for Single-Image Reflection Removal: A
Domain Generalization Perspective [68.37624784559728]
In this paper, we tackle issues by learning SIRR models from a domain perspective.
For each source set, a specific SIRR model is trained to serve as a domain expert of relevant reflection types.
For images from one source set, we train RTAW to only predict expert-wise weights of other domain experts for improving generalization ability.
Experiments show the appealing performance gain of our AdaNEC on different state-of-the-art SIRR networks.
arXiv Detail & Related papers (2022-04-04T14:06:11Z) - An Approach for Combining Multimodal Fusion and Neural Architecture
Search Applied to Knowledge Tracing [6.540879944736641]
We propose a sequential model based optimization approach that combines multimodal fusion and neural architecture search within one framework.
We evaluate our methods on two public real datasets showing the discovered model is able to achieve superior performance.
arXiv Detail & Related papers (2021-11-08T13:43:46Z) - Single-dataset Experts for Multi-dataset Question Answering [6.092171111087768]
We train a network on multiple datasets to generalize and transfer better to new datasets.
Our approach is to model multi-dataset question answering with a collection of single-dataset experts.
Simple methods based on parameter-averaging lead to better zero-shot generalization and few-shot transfer performance.
arXiv Detail & Related papers (2021-09-28T17:08:22Z) - Multimodal Clustering Networks for Self-supervised Learning from
Unlabeled Videos [69.61522804742427]
This paper proposes a self-supervised training framework that learns a common multimodal embedding space.
We extend the concept of instance-level contrastive learning with a multimodal clustering step to capture semantic similarities across modalities.
The resulting embedding space enables retrieval of samples across all modalities, even from unseen datasets and different domains.
arXiv Detail & Related papers (2021-04-26T15:55:01Z) - Multi-Domain Adversarial Feature Generalization for Person
Re-Identification [52.835955258959785]
We propose a multi-dataset feature generalization network (MMFA-AAE)
It is capable of learning a universal domain-invariant feature representation from multiple labeled datasets and generalizing it to unseen' camera systems.
It also surpasses many state-of-the-art supervised methods and unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2020-11-25T08:03:15Z) - Improving QA Generalization by Concurrent Modeling of Multiple Biases [61.597362592536896]
Existing NLP datasets contain various biases that models can easily exploit to achieve high performances on the corresponding evaluation sets.
We propose a general framework for improving the performance on both in-domain and out-of-domain datasets by concurrent modeling of multiple biases in the training data.
We extensively evaluate our framework on extractive question answering with training data from various domains with multiple biases of different strengths.
arXiv Detail & Related papers (2020-10-07T11:18:49Z) - Domain Adaptive Ensemble Learning [141.98192460069765]
We propose a unified framework termed domain adaptive ensemble learning (DAEL) to address both problems.
Experiments on three multi-source UDA and two DG datasets show that DAEL improves the state of the art on both problems, often by significant margins.
arXiv Detail & Related papers (2020-03-16T16:54:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.