R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning
- URL: http://arxiv.org/abs/2507.13107v1
- Date: Thu, 17 Jul 2025 13:22:40 GMT
- Title: R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning
- Authors: Xiaohan Guo, Yusong Cai, Zejia Liu, Zhengning Wang, Lili Pan, Hongliang Li,
- Abstract summary: Redundancy-Removal Mixture of Experts (R2MoE) is a parameter-efficient framework for lifelong visual concept learning.<n>Our method generates images with superior conceptual fidelity compared to the state-of-the-art (SOTA) method.
- Score: 7.08366053718851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enabling large-scale generative models to continuously learn new visual concepts is essential for personalizing pre-trained models to meet individual user preferences. Existing approaches for continual visual concept learning are constrained by two fundamental challenges: catastrophic forgetting and parameter expansion. In this paper, we propose Redundancy-Removal Mixture of Experts (R^2MoE), a parameter-efficient framework for lifelong visual concept learning that effectively learns new concepts while incurring minimal parameter overhead. Our framework includes three key innovative contributions: First, we propose a mixture-of-experts framework with a routing distillation mechanism that enables experts to acquire concept-specific knowledge while preserving the gating network's routing capability, thereby effectively mitigating catastrophic forgetting. Second, we propose a strategy for eliminating redundant layer-wise experts that reduces the number of expert parameters by fully utilizing previously learned experts. Third, we employ a hierarchical local attention-guided inference approach to mitigate interference between generated visual concepts. Extensive experiments have demonstrated that our method generates images with superior conceptual fidelity compared to the state-of-the-art (SOTA) method, achieving an impressive 87.8\% reduction in forgetting rates and 63.3\% fewer parameters on the CustomConcept 101 dataset. Our code is available at {https://github.com/learninginvision/R2MoE}
Related papers
- Forget Less by Learning Together through Concept Consolidation [6.121904567143191]
Custom Diffusion Models (CDMs) have gained significant attention due to their remarkable ability to personalize generative processes.<n>Existing CDMs suffer from catastrophic forgetting when continuously learning new concepts.<n>We propose Forget Less by Learning Together (FL2T) that enables concurrent and order-agnostic concept learning.
arXiv Detail & Related papers (2026-01-05T10:14:16Z) - CURE: Controlled Unlearning for Robust Embeddings - Mitigating Conceptual Shortcuts in Pre-Trained Language Models [23.898244353656352]
We introduce CURE, a framework that systematically disentangles and suppresses conceptual shortcuts.<n>CURE achieves an absolute improvement of +10 points in F1 score on IMDB and +2 points on Yelp.
arXiv Detail & Related papers (2025-09-05T16:47:22Z) - Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z) - Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models [9.340843984411137]
This paper introduces a novel unsupervised concept-based model for image classification, named Learnable Concept-Based Model (LCBM)<n>We demonstrate that LCBM surpasses existing unsupervised concept-based models in generalization capability and nearly matches the performance of black-box models.<n>Despite the use of concept embeddings, we maintain model interpretability by means of a local linear combination of concepts.
arXiv Detail & Related papers (2025-06-02T16:26:41Z) - Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization [20.783312940122297]
Text-to-image (T2I) diffusion models have achieved remarkable success in generating high-quality images from textual prompts.<n>However, their ability to store vast amounts of knowledge raises concerns in scenarios where selective forgetting is necessary.<n>We propose textbfDynamic Mask coupled with Concept-Aware Loss, a novel unlearning framework designed for multi-concept forgetting.
arXiv Detail & Related papers (2025-04-12T01:38:58Z) - Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models.<n>It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z) - Enhancing Recommendation Explanations through User-Centric Refinement [7.640281193938638]
We propose a novel paradigm that refines initial explanations generated by existing explainable recommender models.<n>Specifically, we introduce a multi-agent collaborative refinement framework based on large language models.
arXiv Detail & Related papers (2025-02-17T12:08:18Z) - Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion [50.26583654615212]
Lifelong few-shot customization for text-to-image diffusion aims to continually generalize existing models for new tasks with minimal data.
In this study, we identify and categorize the catastrophic forgetting problems into two folds: relevant concepts forgetting and previous concepts forgetting.
Unlike existing methods that rely on additional real data or offline replay of original concept data, our approach enables on-the-fly knowledge distillation to retain the previous concepts while learning new ones.
arXiv Detail & Related papers (2024-11-08T12:58:48Z) - AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization [3.5066393042242123]
We propose AttenCraft, an attention-based method for multiple-concept disentanglement.<n>We introduce an adaptive algorithm based on attention scores to estimate sampling ratios for different concepts.<n>Our model effectively mitigates two issues, achieving state-of-the-art image fidelity and comparable prompt fidelity to baseline models.
arXiv Detail & Related papers (2024-05-28T08:50:14Z) - Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting [51.606819347636076]
We analyze concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities.
We propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities.
arXiv Detail & Related papers (2024-04-22T09:16:25Z) - Separable Multi-Concept Erasure from Diffusion Models [52.51972530398691]
We propose a Separable Multi-concept Eraser (SepME) to eliminate unsafe concepts from large-scale diffusion models.
The latter separates optimizable model weights, making each weight increment correspond to a specific concept erasure.
Extensive experiments indicate the efficacy of our approach in eliminating concepts, preserving model performance, and offering flexibility in the erasure or recovery of various concepts.
arXiv Detail & Related papers (2024-02-03T11:10:57Z) - Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy [84.11508381847929]
Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks.
We propose M-SMoE, which leverages routing statistics to guide expert merging.
Our MC-SMoE achieves up to 80% memory and a 20% FLOPs reduction, with virtually no loss in performance.
arXiv Detail & Related papers (2023-10-02T16:51:32Z) - Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge
Transfer [15.499267533387039]
The proposed method has been devoted to both lightweight image classification and encoder-decoder architectures to boost the performance of small and compact models without incurring extra computational overhead at the inference process.
The obtained results show that the proposed model has achieved significant improvement over earlier ideas of self-distillation methods.
arXiv Detail & Related papers (2020-10-09T11:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.