Related papers: MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper

MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper

URL: http://arxiv.org/abs/2509.00996v2
Date: Sat, 13 Sep 2025 23:37:47 GMT
Title: MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper
Authors: Runjia Zeng, Guangyan Sun, Qifan Wang, Tong Geng, Sohail Dianat, Xiaotian Han, Raghuveer Rao, Xueling Zhang, Cheng Han, Lifu Huang, Dongfang Liu,
Abstract summary: We propose Mixture of Expert Prompt Tuning (MEPT) as an effective and efficient manifold-mapping framework.<n>MEPT integrates multiple prompt experts to adaptively learn diverse and non-stationary data distributions.<n> Empirical evaluations demonstrate that MEPT outperforms several state-of-the-art parameter efficient baselines on SuperGLUE.
Score: 75.6582687942241
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Considering deep neural networks as manifold mappers, the pretrain-then-fine-tune paradigm can be interpreted as a two-stage process: pretrain establishes a broad knowledge base, and fine-tune adjusts the model parameters to activate specific neural pathways to align with the target manifold. Although prior fine-tuning approaches demonstrate success, their rigid parameter space limits their ability to dynamically activate appropriate neural pathways, rendering them ill-equipped to adapt flexibly to the diverse and evolving data distributions. In light of this view, we propose a novel approach, Mixture of Expert Prompt Tuning (MEPT), as an effective and efficient manifold-mapping framework. MEPT leverages the Mixture of Experts architecture by integrating multiple prompt experts to adaptively learn diverse and non-stationary data distributions. Empirical evaluations demonstrate that MEPT outperforms several state-of-the-art parameter efficient baselines on SuperGLUE, achieving notable improvements in mean accuracy (e.g., 1.94%) while significantly reducing activated prompts by 79.25%. The effectiveness of MEPT is further supported by theoretical insights from manifold learning and validated through neural activation pathway visualization results. Our code is avaliable at https://runjia.tech/emnlp_mept/.

Related papers

Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols [123.73663884421272]
Few-shot transfer has been revolutionized by stronger pre-trained models and improved adaptation algorithms.<n>We establish FEWTRANS, a comprehensive benchmark containing 10 diverse datasets.<n>By releasing FEWTRANS, we aim to provide a rigorous "ruler" to streamline reproducible advances in few-shot transfer learning research.
arXiv Detail & Related papers (2026-02-28T05:41:57Z)
Interim Report on Human-Guided Adaptive Hyperparameter Optimization with Multi-Fidelity Sprints [0.0]
This case study applies a phased hyperparameter optimization process to compare multitask natural language model variants.<n>We employ short, Bayesian optimization sessions that leverage multi-fidelity, hyperparameter space pruning, progressive halving, and a degree of human guidance.<n>We demonstrate our method on a collection of variants of the 2021 Joint Entity and Relation Extraction model proposed by Eberts and Ulges.
arXiv Detail & Related papers (2025-05-14T20:38:44Z)
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning [35.62208317531141]
We advocate and introduce the unrolling paradigm, also referred to as "learning to optimize"<n>Our unrolling approach covers various statistical feature distributions and pre-training paradigms.<n>We report comprehensive experiments, which cover a breadth of fine-grained downstream image classification tasks.
arXiv Detail & Related papers (2024-12-21T19:01:57Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Embedded Visual Prompt Tuning [18.094731760514264]
We study the effectiveness of fine-tuning methods when adapting foundation models to medical image classification tasks.<n>We propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels.<n>EPT outperforms several state-of-the-art finetuning methods by a significant margin on few-shot medical image classification tasks.
arXiv Detail & Related papers (2024-07-01T06:35:53Z)
T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning [31.276142111455847]
Large language models (LLMs) encounter significant adaptation challenges in diverse multitask finetuning.<n>We design a novel framework, mixunderlinetextbfTureunderlinetextbf-of-underlinetextbfRank-onunderlinetextbfE-eunderlinetextbfXper ts (textttT-REX)<n>Rank-1 experts enable a mix-and-match mechanism to quadratically expand the vector subspace of experts with linear parameter overheads, achieving approximate error reduction with optimal
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Med-Tuning: A New Parameter-Efficient Tuning Framework for Medical Volumetric Segmentation [37.42382366505377]
We introduce a new framework named Med-Tuning to realize parameter-efficient tuning (PET) for medical volumetric segmentation task. Our framework enhances the 2D baselines's precision on segmentation tasks, which are pre-trained on natural images. Compared to full FT, Med-Tuning reduces the fine-tuned model parameters by up to 4x, with even better segmentation performance.
arXiv Detail & Related papers (2023-04-21T10:47:13Z)
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models [68.9288651177564]
We present a novel MoE architecture based on matrix product operators (MPO) from quantum many-body physics. With the decomposed MPO structure, we can reduce the parameters of the original MoE architecture. Experiments on the three well-known downstream natural language datasets based on GPT2 show improved performance and efficiency in increasing model capacity.
arXiv Detail & Related papers (2022-03-02T13:44:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.