ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation
- URL: http://arxiv.org/abs/2507.00502v2
- Date: Wed, 02 Jul 2025 11:38:41 GMT
- Title: ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation
- Authors: JianChao Zhao, Chenhao Ding, Songlin Dong, Yuhang He, Yihong Gong,
- Abstract summary: Continual Test-Time Adaptation (CTTA) aims to enable models to adapt on-the-fly to a stream of unlabeled data under evolving distribution shifts.<n>We propose ExPaMoE, a novel framework based on an Expandable Parallel Mixture-of-Experts architecture.
- Score: 19.751562859766565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Test-Time Adaptation (CTTA) aims to enable models to adapt on-the-fly to a stream of unlabeled data under evolving distribution shifts. However, existing CTTA methods typically rely on shared model parameters across all domains, making them vulnerable to feature entanglement and catastrophic forgetting in the presence of large or non-stationary domain shifts. To address this limitation, we propose ExPaMoE, a novel framework based on an Expandable Parallel Mixture-of-Experts architecture. ExPaMoE decouples domain-general and domain-specific knowledge via a dual-branch expert design with token-guided feature separation, and dynamically expands its expert pool based on a Spectral-Aware Online Domain Discriminator (SODD) that detects distribution changes in real-time using frequency-domain cues. Extensive experiments demonstrate the superiority of ExPaMoE across diverse CTTA scenarios. We evaluate our method on standard benchmarks including CIFAR-10C, CIFAR-100C, ImageNet-C, and Cityscapes-to-ACDC for semantic segmentation. Additionally, we introduce ImageNet++, a large-scale and realistic CTTA benchmark built from multiple ImageNet-derived datasets, to better reflect long-term adaptation under complex domain evolution. ExPaMoE consistently outperforms prior arts, showing strong robustness, scalability, and resistance to forgetting.
Related papers
- CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning [30.111296778234124]
CorrMoE is a correspondence pruning framework that enhances robustness under cross-domain and cross-scene variations.<n>For scene diversity, we design a Bi-Fusion Mixture of Experts module that adaptively integrates multi-perspective features.<n>Experiments on benchmark datasets demonstrate that CorrMoE achieves superior accuracy and generalization compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-07-16T01:44:01Z) - Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models [5.466962214217334]
Supervised fine-tuning (SFT) is widely used to align large language models (LLMs) with information extraction (IE) tasks, such as named entity recognition (NER)<n>We propose the SaM framework, which dynamically Selects and Merges expert models at inference time.
arXiv Detail & Related papers (2025-06-28T08:28:52Z) - MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization [0.0]
MoE-MLoRA is a mixture-of-experts framework where each expert is first trained independently to specialize in its domain.<n>We evaluate MoE-MLoRA across eight CTR models on Movielens and Taobao.
arXiv Detail & Related papers (2025-06-09T09:03:05Z) - RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement [59.364418120895]
Underwater image enhancement (UIE) is a critical preprocessing step for marine vision applications.<n>We develop a novel relation-driven Mamba framework for effective UIE (RD-UIE)<n>Experiments on underwater enhancement benchmarks demonstrate RD-UIE outperforms the state-of-the-art approach WMamba.
arXiv Detail & Related papers (2025-05-02T12:21:44Z) - Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation [50.31351006532924]
Human pose estimation (HPE) has received increasing attention recently due to its wide application in motion analysis, virtual reality, healthcare, etc.<n>It suffers from the lack of labeled diverse real-world datasets due to the time- and labor-intensive annotation.<n>We introduce a novel framework that capitalizes on both representation aggregation and segregation for domain adaptive human pose estimation.
arXiv Detail & Related papers (2024-12-29T17:59:45Z) - GM-DF: Generalized Multi-Scenario Deepfake Detection [49.072106087564144]
Existing face forgery detection usually follows the paradigm of training models in a single domain.
In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets.
arXiv Detail & Related papers (2024-06-28T17:42:08Z) - BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation [59.1863462632777]
Continual Test Time Adaptation (CTTA) is required to adapt efficiently to continuous unseen domains while retaining previously learned knowledge.
This paper proposes BECoTTA, an input-dependent and efficient modular framework for CTTA.
We validate that our method outperforms multiple CTTA scenarios, including disjoint and gradual domain shits, while only requiring 98% fewer trainable parameters.
arXiv Detail & Related papers (2024-02-13T18:37:53Z) - Test-time Adaptation in the Dynamic World with Compound Domain Knowledge
Management [75.86903206636741]
Test-time adaptation (TTA) allows the model to adapt itself to novel environments and improve its performance during test time.
Several works for TTA have shown promising adaptation performances in continuously changing environments.
This paper first presents a robust TTA framework with compound domain knowledge management.
We then devise novel regularization which modulates the adaptation rates using domain-similarity between the source and the current target domain.
arXiv Detail & Related papers (2022-12-16T09:02:01Z) - Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models.
Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence)
We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.