Related papers: MoDEM: Mixture of Domain Expert Models

Related papers

DRAMA: Domain Retrieval using Adaptive Module Allocation [19.15437181769345]
DRAMA (Domain Retrieval using Adaptive Module Allocation) is an energy- and parameter-efficient framework designed to reduce the environmental footprint of neural retrieval.<n>This paper introduces DRAMA, an energy- and parameter-efficient framework designed to reduce the environmental footprint of neural retrieval.
arXiv Detail & Related papers (2026-02-16T17:38:24Z)
Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe [51.26601054313749]
Recent efforts on Diffusion MoE models have primarily focused on developing more sophisticated routing mechanisms.<n>Inspired by the MoE design paradigms established in large language models (LLMs), we identify a set of crucial architectural factors for building effective Diffusion MoE models.<n>We present novel architectures that can be efficiently applied to both latent and pixel-space diffusion frameworks.
arXiv Detail & Related papers (2025-12-01T03:52:31Z)
Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models [5.466962214217334]
Supervised fine-tuning (SFT) is widely used to align large language models (LLMs) with information extraction (IE) tasks, such as named entity recognition (NER)<n>We propose the SaM framework, which dynamically Selects and Merges expert models at inference time.
arXiv Detail & Related papers (2025-06-28T08:28:52Z)
AIDE: Agentically Improve Visual Language Model with Domain Experts [39.34183197101934]
AIDE (Agentic Improvement through Domain Experts) is a novel framework that enables Visual Language Models to autonomously enhance their capabilities. AIDE operates through a four-stage process: (1) identifying instances for refinement, (2) engaging domain experts for targeted analysis, (3) synthesizing expert outputs with existing data, and (4) integrating enhanced instances into the training pipeline.
arXiv Detail & Related papers (2025-02-13T08:05:44Z)
GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism [20.765816590224787]
GRAPHMOE is a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2025-01-14T06:59:51Z)
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons [85.99268361356832]
We introduce a process of adapting an MLLM to a Generalist Embodied Agent (GEA) GEA is a single unified model capable of grounding itself across varied domains through a multi-embodiment action tokenizer. Our findings reveal the importance of training with cross-domain data and online RL for building generalist agents.
arXiv Detail & Related papers (2024-12-11T15:06:25Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
MoD: A Distribution-Based Approach for Merging Large Language Models [0.0]
Large language models (LLMs) have enabled the development of numerous specialized, task-specific variants. We propose the textitMixture of Distributions (MoD) framework, a novel approach for merging LLMs. Unlike traditional weight-averaging methods, MoD effectively preserves the specialized capabilities of individual models.
arXiv Detail & Related papers (2024-11-01T07:05:29Z)
On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly. In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z)
Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification. We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z)
Exploring Domain Robust Lightweight Reward Models based on Router Mechanism [1.3624495460189863]
We explore the utilization of small language models operating in a domain-specific manner based on router mechanisms. Our three approaches are: 1) utilize mixture of experts to form a single reward model by modularizing an internal router and experts, 2) employing external router to select the appropriate reward model from multiple domain-specific models, and 3) the framework reduces parameter size by loading reward models and router adapters onto a single small language model using adapters.
arXiv Detail & Related papers (2024-07-24T17:25:12Z)
Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training. We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling. We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z)
Performance Characterization of Expert Router for Scalable LLM Inference [0.4726677580049183]
Large Language Models (LLMs) have experienced widespread adoption across scientific and industrial domains. deploying and serving these models at scale with optimal throughput and latency remains a significant challenge. This paper introduces Expert Router, a scalable routing architecture that directs to specialized expert models.
arXiv Detail & Related papers (2024-04-22T16:33:42Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
AdaptDHM: Adaptive Distribution Hierarchical Model for Multi-Domain CTR Prediction [4.299153274884263]
We propose an elegant and flexible multi-distribution modeling paradigm, named Adaptive Distribution Hierarchical Model (AdaptDHM) Our model achieves impressive prediction accuracy and its time cost during the training stage is more than 50% less than that of other models.
arXiv Detail & Related papers (2022-11-22T09:10:37Z)
Unified Modeling of Multi-Domain Multi-Device ASR Systems [13.61897259469694]
We propose an innovative approach that integrates the different per-domain per-device models into a unified model. Experiments show that our proposed unified modeling approach actually outperforms the carefully tuned per-domain models.
arXiv Detail & Related papers (2022-05-13T14:07:22Z)
A Novel Mix-normalization Method for Generalizable Multi-source Person Re-identification [49.548815417844786]
Person re-identification (Re-ID) has achieved great success in the supervised scenario. It is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains. We propose MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR)
arXiv Detail & Related papers (2022-01-24T18:09:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.