MoDEM: Mixture of Domain Expert Models
- URL: http://arxiv.org/abs/2410.07490v1
- Date: Wed, 9 Oct 2024 23:52:54 GMT
- Title: MoDEM: Mixture of Domain Expert Models
- Authors: Toby Simonds, Kemal Kurniawan, Jey Han Lau,
- Abstract summary: We propose a novel approach to enhancing the performance and efficiency of large language models (LLMs)
We introduce a system that utilizes a BERT-based router to direct incoming prompts to the most appropriate domain expert model.
Our research demonstrates that this approach can significantly outperform general-purpose models of comparable size.
- Score: 23.846823652305027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel approach to enhancing the performance and efficiency of large language models (LLMs) by combining domain prompt routing with domain-specialized models. We introduce a system that utilizes a BERT-based router to direct incoming prompts to the most appropriate domain expert model. These expert models are specifically tuned for domains such as health, mathematics and science. Our research demonstrates that this approach can significantly outperform general-purpose models of comparable size, leading to a superior performance-to-cost ratio across various benchmarks. The implications of this study suggest a potential paradigm shift in LLM development and deployment. Rather than focusing solely on creating increasingly large, general-purpose models, the future of AI may lie in developing ecosystems of smaller, highly specialized models coupled with sophisticated routing systems. This approach could lead to more efficient resource utilization, reduced computational costs, and superior overall performance.
Related papers
- A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - MoD: A Distribution-Based Approach for Merging Large Language Models [0.0]
Large language models (LLMs) have enabled the development of numerous specialized, task-specific variants.
We propose the textitMixture of Distributions (MoD) framework, a novel approach for merging LLMs.
Unlike traditional weight-averaging methods, MoD effectively preserves the specialized capabilities of individual models.
arXiv Detail & Related papers (2024-11-01T07:05:29Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification.
We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z) - Exploring Domain Robust Lightweight Reward Models based on Router Mechanism [1.3624495460189863]
We explore the utilization of small language models operating in a domain-specific manner based on router mechanisms.
Our three approaches are: 1) utilize mixture of experts to form a single reward model by modularizing an internal router and experts, 2) employing external router to select the appropriate reward model from multiple domain-specific models, and 3) the framework reduces parameter size by loading reward models and router adapters onto a single small language model using adapters.
arXiv Detail & Related papers (2024-07-24T17:25:12Z) - Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training.
We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling.
We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z) - Performance Characterization of Expert Router for Scalable LLM Inference [0.4726677580049183]
Large Language Models (LLMs) have experienced widespread adoption across scientific and industrial domains.
deploying and serving these models at scale with optimal throughput and latency remains a significant challenge.
This paper introduces Expert Router, a scalable routing architecture that directs to specialized expert models.
arXiv Detail & Related papers (2024-04-22T16:33:42Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - AdaptDHM: Adaptive Distribution Hierarchical Model for Multi-Domain CTR
Prediction [4.299153274884263]
We propose an elegant and flexible multi-distribution modeling paradigm, named Adaptive Distribution Hierarchical Model (AdaptDHM)
Our model achieves impressive prediction accuracy and its time cost during the training stage is more than 50% less than that of other models.
arXiv Detail & Related papers (2022-11-22T09:10:37Z) - Unified Modeling of Multi-Domain Multi-Device ASR Systems [13.61897259469694]
We propose an innovative approach that integrates the different per-domain per-device models into a unified model.
Experiments show that our proposed unified modeling approach actually outperforms the carefully tuned per-domain models.
arXiv Detail & Related papers (2022-05-13T14:07:22Z) - A Novel Mix-normalization Method for Generalizable Multi-source Person
Re-identification [49.548815417844786]
Person re-identification (Re-ID) has achieved great success in the supervised scenario.
It is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains.
We propose MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR)
arXiv Detail & Related papers (2022-01-24T18:09:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.