LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection
- URL: http://arxiv.org/abs/2509.24547v1
- Date: Mon, 29 Sep 2025 10:00:25 GMT
- Title: LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection
- Authors: Bao-Ngoc Dao, Quang Nguyen, Luyen Ngo Dinh, Minh Le, Linh Ngo Van,
- Abstract summary: LEAF is a novel and robust expert-based framework for Continual Event Detection.<n>It incorporates a specialized mixture of experts architecture into the base model, where each expert is parameterized with low-rank adaptation (LoRA) matrices.<n>A semantic-aware expert selection mechanism dynamically routes instances to the most relevant experts, enabling expert specialization and reducing knowledge interference.
- Score: 7.094483187879095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot Continual Event Detection (FCED) poses the dual challenges of learning from limited data and mitigating catastrophic forgetting across sequential tasks. Existing approaches often suffer from severe forgetting due to the full fine-tuning of a shared base model, which leads to knowledge interference between tasks. Moreover, they frequently rely on data augmentation strategies that can introduce unnatural or semantically distorted inputs. To address these limitations, we propose LEAF, a novel and robust expert-based framework for FCED. LEAF integrates a specialized mixture of experts architecture into the base model, where each expert is parameterized with low-rank adaptation (LoRA) matrices. A semantic-aware expert selection mechanism dynamically routes instances to the most relevant experts, enabling expert specialization and reducing knowledge interference. To improve generalization in limited-data settings, LEAF incorporates a contrastive learning objective guided by label descriptions, which capture high-level semantic information about event types. Furthermore, to prevent overfitting on the memory buffer, our framework employs a knowledge distillation strategy that transfers knowledge from previous models to the current one. Extensive experiments on multiple FCED benchmarks demonstrate that LEAF consistently achieves state-of-the-art performance.
Related papers
- MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection [6.6626674107399495]
MoECLIP is a Mixture-of-Experts architecture for the Zero-Shot Anomaly Detection (ZSAD) task.<n>It achieves patch-level adaptation by dynamically routing each image patch to a specialized Low-Rank Adaptation (LoRA) expert based on its unique characteristics.
arXiv Detail & Related papers (2026-03-03T15:36:55Z) - Split-on-Share: Mixture of Sparse Experts for Task-Agnostic Continual Learning [10.01449025634975]
Continual learning in Large Language Models (LLMs) is hindered by the plasticity-stability dilemma.<n>We introduce SETA, a framework that resolves the plasticity-stability conflict by decomposing the model into modular subspaces.<n>We show that SETA consistently outperforms state-of-the-art parameter-efficient fine-tuning-based continual learning methods.
arXiv Detail & Related papers (2026-01-24T22:39:22Z) - Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder [59.89996751196727]
Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models.<n>SAEs' hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs.<n>Recent Mixture of Experts (MoE) approaches attempt to address this by SAEs into narrower expert networks with gated activation.<n>We propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling.
arXiv Detail & Related papers (2025-11-07T22:19:34Z) - One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning [52.966712416640085]
We propose SMoPE, a novel framework that integrates the benefits of both task-specific and shared prompt strategies.<n>SMoPE consistently outperforms task-specific prompt methods and achieves performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2025-09-29T08:54:58Z) - PL-CA: A Parametric Legal Case Augmentation Framework [10.998168534326709]
Conventional RAG only injects retrieved documents directly into the model's context.<n>Many existing benchmarks lack expert annotation and focus solely on individual downstream tasks.<n>We propose PL-CA, which introduces a parametric RAG framework to perform data augmentation on corpus knowledge.
arXiv Detail & Related papers (2025-09-08T06:08:06Z) - Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting [24.67373225584835]
Large Vision Language Models have demonstrated impressive versatile capabilities through extensive multimodal pre-training.<n>These models struggle with a fundamental dilemma: direct adaptation approaches that inject domain-specific knowledge often trigger catastrophic forgetting of foundational visual-linguistic abilities.<n>We introduce Structured Dialogue Fine-Tuning (SDFT), an effective approach that effectively injects domain-specific knowledge while minimizing catastrophic forgetting.
arXiv Detail & Related papers (2025-04-27T18:04:02Z) - DUKAE: DUal-level Knowledge Accumulation and Ensemble for Pre-Trained Model-Based Continual Learning [19.684132921720945]
Pre-trained model-based continual learning (PTMCL) has garnered growing attention, as it enables more rapid acquisition of new knowledge.<n>We propose a method named DUal-level Knowledge Accumulation and Ensemble (DUKAE) that leverages both feature-level and decision-level knowledge accumulation.<n>Experiments on CIFAR-100, ImageNet-R, CUB-200, and Cars-196 datasets demonstrate the superior performance of our approach.
arXiv Detail & Related papers (2025-04-09T01:40:38Z) - LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models [61.96237184081951]
Low-Rank Adaptation (LoRA) is widely used to efficiently acquire specialized knowledge in Multimodal Large Language Models (MLLMs)<n>LoRA introduces substantial harmful redundancy during visual instruction tuning, which exacerbates the forgetting of general knowledge and degrades downstream task performance.<n>We propose LoRASculpt to eliminate harmful redundant parameters, thereby harmonizing general and specialized knowledge.
arXiv Detail & Related papers (2025-03-21T04:31:09Z) - Convergence Rates for Softmax Gating Mixture of Experts [78.3687645289918]
Mixture of experts (MoE) has emerged as an effective framework to advance the efficiency and scalability of machine learning models.<n>Central to the success of MoE is an adaptive softmax gating mechanism which takes responsibility for determining the relevance of each expert to a given input and then dynamically assigning experts their respective weights.<n>We perform a convergence analysis of parameter estimation and expert estimation under the MoE equipped with the standard softmax gating or its variants, including a dense-to-sparse gating and a hierarchical softmax gating.
arXiv Detail & Related papers (2025-03-05T06:11:24Z) - Continuous Knowledge-Preserving Decomposition with Adaptive Layer Selection for Few-Shot Class-Incremental Learning [73.59672160329296]
CKPD-FSCIL is a unified framework that unlocks the underutilized capacity of pretrained weights.<n>Our method consistently outperforms state-of-the-art approaches in both adaptability and knowledge retention.
arXiv Detail & Related papers (2025-01-09T07:18:48Z) - LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization [61.16890890570814]
Domain generalization (DG) methods aim to maintain good performance in an unseen target domain by using training data from multiple source domains.
This work introduces a simple yet effective framework, dubbed learning from multiple experts (LFME) that aims to make the target model an expert in all source domains to improve DG.
arXiv Detail & Related papers (2024-10-22T13:44:10Z) - Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges.
We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.
Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z) - MoEC: Mixture of Expert Clusters [93.63738535295866]
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.
MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated.
However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation.
arXiv Detail & Related papers (2022-07-19T06:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.