MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images
- URL: http://arxiv.org/abs/2511.13099v1
- Date: Mon, 17 Nov 2025 07:51:18 GMT
- Title: MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images
- Authors: Doanh C. Bui, Ba Hung Ngo, Hoai Luan Pham, Khang Nguyen, Maï K. Nguyen, Yasuhiko Nakashima,
- Abstract summary: Lifelong learning on Whole Slide Images (WSIs) aims to train or fine-tune a unified model sequentially on cancer-related tasks.<n>We introduce MergeSlide, a framework that treats lifelong learning as a model merging problem by leveraging a vision-language pathology model.
- Score: 8.327738094483207
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lifelong learning on Whole Slide Images (WSIs) aims to train or fine-tune a unified model sequentially on cancer-related tasks, reducing the resources and effort required for data transfer and processing, especially given the gigabyte-scale size of WSIs. In this paper, we introduce MergeSlide, a simple yet effective framework that treats lifelong learning as a model merging problem by leveraging a vision-language pathology foundation model. When a new task arrives, it is: 1) defined with class-aware prompts, 2) fine-tuned for a few epochs using an MLP-free backbone, and 3) merged into a unified model using an orthogonal continual merging strategy that preserves performance and mitigates catastrophic forgetting. For inference under the class-incremental learning (CLASS-IL) setting, where task identity is unknown, we introduce Task-to-Class Prompt-aligned (TCP) inference. Specifically, TCP first identifies the most relevant task using task-level prompts and then applies the corresponding class-aware prompts to generate predictions. To evaluate MergeSlide, we conduct experiments on a stream of six TCGA datasets. The results show that MergeSlide outperforms both rehearsal-based continual learning and vision-language zero-shot baselines. Code and data are available at https://github.com/caodoanh2001/MergeSlide.
Related papers
- Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [57.514786046966265]
We propose textbfPerturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to mitigate forgetting.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z) - Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification [10.667645628712542]
Whole Slide Image (WSI) classification has very significant applications in clinical pathology.<n>This paper proposes the first Vision-Language-based framework with Queryable Prototype Multiple Instance Learning (QPMIL-VL) specially designed for incremental WSI classification.
arXiv Detail & Related papers (2024-10-14T14:49:34Z) - Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting [49.87694319431288]
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources.
We propose a Comprehensive Generative (CGR) framework that restores appearance and semantic knowledge by synthesizing image-mask pairs.
Experiments on incremental tasks (cardiac, fundus and prostate segmentation) show its clear advantage for alleviating concurrent appearance and semantic forgetting.
arXiv Detail & Related papers (2024-06-28T10:05:58Z) - One Train for Two Tasks: An Encrypted Traffic Classification Framework
Using Supervised Contrastive Learning [18.63871240173137]
We propose an effective model named a Contrastive Learning Enhanced Temporal Fusion (CLE-TFE)
In particular, we utilize supervised contrastive learning to enhance the packet-level and flow-level representations.
We also propose cross-level multi-task learning, which simultaneously accomplishes the packet-level and flow-level classification tasks in the same model with one training.
arXiv Detail & Related papers (2024-02-12T09:10:09Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Enhancing Self-Supervised Learning for Remote Sensing with Elevation
Data: A Case Study with Scarce And High Level Semantic Labels [1.534667887016089]
This work proposes a hybrid unsupervised and supervised learning method to pre-train models applied in Earth observation downstream tasks.
We combine a contrastive approach to pre-train models with a pixel-wise regression pre-text task to predict coarse elevation maps.
arXiv Detail & Related papers (2023-04-13T23:01:11Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning [60.501201259732625]
We introduce task-adaptive saliency for EFCIL and propose a new framework, which we call Task-Adaptive Saliency Supervision (TASS)
Our experiments demonstrate that our method can better preserve saliency maps across tasks and achieve state-of-the-art results on the CIFAR-100, Tiny-ImageNet, and ImageNet-Subset EFCIL benchmarks.
arXiv Detail & Related papers (2022-12-16T02:43:52Z) - vCLIMB: A Novel Video Class Incremental Learning Benchmark [53.90485760679411]
We introduce vCLIMB, a novel video continual learning benchmark.
vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning.
We propose a temporal consistency regularization that can be applied on top of memory-based continual learning methods.
arXiv Detail & Related papers (2022-01-23T22:14:17Z) - Dependent Multi-Task Learning with Causal Intervention for Image
Captioning [10.6405791176668]
In this paper, we propose a dependent multi-task learning framework with the causal intervention (DMTCI)
Firstly, we involve an intermediate task, bag-of-categories generation, before the final task, image captioning.
Secondly, we apply Pearl's do-calculus on the model, cutting off the link between the visual features and possible confounders.
Finally, we use a multi-agent reinforcement learning strategy to enable end-to-end training and reduce the inter-task error accumulations.
arXiv Detail & Related papers (2021-05-18T14:57:33Z) - Adversarial Continual Learning [99.56738010842301]
We propose a hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features.
Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills.
arXiv Detail & Related papers (2020-03-21T02:08:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.