pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
- URL: http://arxiv.org/abs/2602.22938v1
- Date: Thu, 26 Feb 2026 12:27:06 GMT
- Title: pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
- Authors: Shentong Mo, Xufang Luo, Dongsheng Li,
- Abstract summary: We propose a novel Mixture-of-Experts prompt tuning method called pMoE.<n>The proposed pMoE significantly enhances the model's versatility and applicability to a broad spectrum of tasks.<n>We conduct extensive experiments across 47 adaptation tasks, including both classification and segmentation in general and medical domains.
- Score: 68.3777121585281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient fine-tuning has demonstrated promising results across various visual adaptation tasks, such as classification and segmentation. Typically, prompt tuning techniques have harnessed knowledge from a single pre-trained model, whether from a general or a specialized medical domain. However, this approach typically overlooks the potential synergies that could arise from integrating diverse domain knowledge within the same tuning process. In this work, we propose a novel Mixture-of-Experts prompt tuning method called pMoE, which leverages the strengths of multiple expert domains through expert-specialized prompt tokens and the learnable dispatcher, effectively combining their expertise in a unified model framework. Our pMoE introduces expert-specific prompt tokens and utilizes a dynamic token dispatching mechanism at various prompt layers to optimize the contribution of each domain expert during the adaptation phase. By incorporating both domain knowledge from diverse experts, the proposed pMoE significantly enhances the model's versatility and applicability to a broad spectrum of tasks. We conduct extensive experiments across 47 adaptation tasks, including both classification and segmentation in general and medical domains. The results demonstrate that our pMoE not only achieves superior performance with a large margin of improvements but also offers an optimal trade-off between computational efficiency and adaptation effectiveness compared to existing methods.
Related papers
- MME: Mixture of Mesh Experts with Random Walk Transformer Gating [13.564417897372875]
We present a novel Mixture of Experts (MoE) framework designed to harness the complementary strengths of diverse approaches.<n>We propose a new gate architecture that encourages each expert to specialise in the classes it excels in.<n>Our framework achieves state-of-the-art results in mesh classification, retrieval, and semantic segmentation tasks.
arXiv Detail & Related papers (2026-02-28T22:13:00Z) - GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models [30.023472202549076]
Graph Neural Networks (GNNs) have demonstrated impressive performance on task-specific benchmarks, yet their ability to generalize across diverse domains and tasks remains limited.<n>We propose GMoPE, a framework that seamlessly integrates the Mixture-of-Experts (MoE) architecture with prompt-based learning for graphs.<n>We show that GMoPE consistently outperforms state-of-the-art baselines and achieves performance comparable to full parameter fine-tuning.
arXiv Detail & Related papers (2025-11-05T07:28:51Z) - Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization [60.309915093470416]
Matryoshka MoE (M-MoE) is a training framework that instills a coarse-to-fine structure directly into the expert ensemble.<n>Our work paves the way for more practical and adaptable deployments of large-scale MoE models.
arXiv Detail & Related papers (2025-09-30T16:56:44Z) - One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning [52.966712416640085]
We propose SMoPE, a novel framework that integrates the benefits of both task-specific and shared prompt strategies.<n>SMoPE consistently outperforms task-specific prompt methods and achieves performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2025-09-29T08:54:58Z) - MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization [0.0]
MoE-MLoRA is a mixture-of-experts framework where each expert is first trained independently to specialize in its domain.<n>We evaluate MoE-MLoRA across eight CTR models on Movielens and Taobao.
arXiv Detail & Related papers (2025-06-09T09:03:05Z) - Adaptive Conditional Expert Selection Network for Multi-domain Recommendation [10.418133538132635]
Mixture-of-Experts (MOE) has recently become the de facto standard in Multi-domain recommendation (MDR)
CESAA consists of Conditional Expert Selection (CES) Module and Adaptive Expert Aggregation (AEA) Module.
AEA utilizes mutual information loss to strengthen the correlations between experts and specific domains, and significantly improve the distinction between experts.
arXiv Detail & Related papers (2024-11-11T09:39:31Z) - Scalable Multi-Domain Adaptation of Language Models using Modular Experts [10.393155077703653]
MoDE is a mixture-of-experts architecture that augments a general PLM with modular, domain-specialized experts.
MoDE achieves comparable target performances to full parameter fine-tuning while achieving 1.65% better retention performance.
arXiv Detail & Related papers (2024-10-14T06:02:56Z) - M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning [90.75075886543404]
Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across a wide range of domains.
In this work, we introduce a novel Multimodal Prompt Tuning (M$2$PT) approach for efficient instruction tuning of MLLMs.
arXiv Detail & Related papers (2024-09-24T01:40:24Z) - Multi-Head Mixture-of-Experts [100.60556163597946]
We propose Multi-Head Mixture-of-Experts (MH-MoE), which employs a multi-head mechanism to split each token into multiple sub-tokens.
MH-MoE is straightforward to implement and decouples from other SMoE optimization methods, making it easy to integrate with other SMoE models for enhanced performance.
arXiv Detail & Related papers (2024-04-23T13:47:09Z) - T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning [31.276142111455847]
Large language models (LLMs) encounter significant adaptation challenges in diverse multitask finetuning.<n>We design a novel framework, mixunderlinetextbfTureunderlinetextbf-of-underlinetextbfRank-onunderlinetextbfE-eunderlinetextbfXper ts (textttT-REX)<n>Rank-1 experts enable a mix-and-match mechanism to quadratically expand the vector subspace of experts with linear parameter overheads, achieving approximate error reduction with optimal
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts [74.40198929049959]
Large multi-modal models (LMMs) exhibit remarkable performance across numerous tasks.
generalist LMMs often suffer from performance degradation when tuned over a large collection of tasks.
We propose Omni-SMoLA, an architecture that uses the Soft MoE approach to mix many multimodal low rank experts.
arXiv Detail & Related papers (2023-12-01T23:04:27Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.