Related papers: Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing

Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing

URL: http://arxiv.org/abs/2602.06862v1
Date: Fri, 06 Feb 2026 16:50:38 GMT
Title: Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing
Authors: Meng Lou, Stanley Yu, Yizhou Yu,
Abstract summary: AdaRoute is a new adapter-style method featuring a simple mixture-of-experts (MoE) architecture.<n> Dynamic weight matrices in AdaRoute modules facilitate low-rank adaptation in an input-dependent manner.<n>Experiments demonstrate the superiority of AdaRoute on diverse vision tasks.
Score: 41.836954056293614
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adapting pre-trained vision models using parameter-efficient fine-tuning (PEFT) remains challenging, as it aims to achieve performance comparable to full fine-tuning using a minimal number of trainable parameters. When applied to complex dense prediction tasks, existing methods exhibit limitations, including input-agnostic modeling and redundant cross-layer representations. To this end, we propose AdaRoute, a new adapter-style method featuring a simple mixture-of-experts (MoE) architecture. Specifically, we introduce shared expert centers, where each expert is a trainable parameter matrix. During a feedforward pass, each AdaRoute module in the network dynamically generates weight matrices tailored for the current module via a simple dynamic parameter routing mechanism, which selectively aggregates parameter matrices in the corresponding expert center. Dynamic weight matrices in AdaRoute modules facilitate low-rank adaptation in an input-dependent manner, thus generating more customized and powerful feature representations. Moreover, since AdaRoute modules across multiple network layers share the same expert center, they improve feature diversity by promoting implicit cross-layer feature interaction. Extensive experiments demonstrate the superiority of AdaRoute on diverse vision tasks, including semantic segmentation, object detection and instance segmentation, and panoptic segmentation. Code will be available at: https://bit.ly/3NZcr0H.

Related papers

Hyperparameter Transfer with Mixture-of-Expert Layers [51.03005470884366]
Mixture-of-Experts (MoE) layers have emerged as an important tool in scaling up modern neural networks.<n>We propose a new parameterization for transformer models with MoE layers when scaling model width, depth, number of experts, and expert (hidden) size.
arXiv Detail & Related papers (2026-01-28T03:02:30Z)
L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts [10.21556794551883]
We present L-MoE: a Lightweight Mixture of LoRA Experts.<n>L-MoE redefines MoE experts as task-specialized, low-rank adapters.<n>We present the formal mathematical framework for L-MoE.
arXiv Detail & Related papers (2025-10-19T08:44:25Z)
Is Multiple Object Tracking a Matter of Specialization? [33.59920084936913]
Training end-to-end transformer-based trackers in heterogeneous scenarios poses significant challenges. We introduce. The Scenario-specific Tracking Architecture (PASTA), a novel framework that combines. Efficient Fine-Tuning (PEFT) and Modular Deep Learning.
arXiv Detail & Related papers (2024-11-01T13:03:58Z)
Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter. We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another. Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z)
Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges. Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning. A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z)
Prompt-Matched Semantic Segmentation [96.99924127527002]
The objective of this work is to explore how to effectively adapt pre-trained foundation models to various downstream tasks of image semantic segmentation. We propose a novel Inter-Stage Prompt-Matched Framework, which maintains the original structure of the foundation model while generating visual prompts adaptively for task-oriented tuning. A lightweight module termed Semantic-aware Prompt Matcher is then introduced to hierarchically interpolate between two stages to learn reasonable prompts for each specific task.
arXiv Detail & Related papers (2022-08-22T09:12:53Z)
Learning Dynamic Routing for Semantic Segmentation [86.56049245100084]
This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing. The proposed framework generates data-dependent routes, adapting to the scale distribution of each image. To this end, a differentiable gating function, called soft conditional gate, is proposed to select scale transform paths on the fly.
arXiv Detail & Related papers (2020-03-23T17:22:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.