InvCoSS: Inversion-driven Continual Self-supervised Learning in Medical Multi-modal Image Pre-training
- URL: http://arxiv.org/abs/2512.19213v1
- Date: Mon, 22 Dec 2025 09:53:38 GMT
- Title: InvCoSS: Inversion-driven Continual Self-supervised Learning in Medical Multi-modal Image Pre-training
- Authors: Zihao Luo, Shaohao Rui, Zhenyu Tang, Guotai Wang, Xiaosong Wang,
- Abstract summary: Continual self-supervised learning (CSSL) in medical imaging trains a foundation model sequentially.<n>We propose InvCoSS, an inversion-driven continual self-supervised learning framework for medical multi-modal image pre-training.<n>InvCoSS inverts the pre-trained self-supervised model to generate synthetic images that approximate the original training distribution.
- Score: 7.7475546440997265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual self-supervised learning (CSSL) in medical imaging trains a foundation model sequentially, alleviating the need for collecting multi-modal images for joint training and offering promising improvements in downstream performance while preserving data privacy. However, most existing methods still rely on replaying data from previous stages to prevent catastrophic forgetting, which compromises privacy and limits their applicability in real-world scenarios where data transfer across sites is often restricted. In this work, we propose InvCoSS, an inversion-driven continual self-supervised learning framework for medical multi-modal image pre-training. Specifically, after training on a previous task, InvCoSS inverts the pre-trained self-supervised model to generate synthetic images that approximate the original training distribution. These synthetic images are then combined with data from the new task for joint optimization, which effectively mitigates catastrophic forgetting while strictly adhering to the constraint of no access to previous real data. Furthermore, to improve the fidelity of synthetic images, we introduce a novel InvUNet with a multi-scale fusion architecture to restore both high- and low-frequency components of the inverted images. To enhance diversity and prevent mode collapse, we design a repulsive representation-learning mechanism that encourages a diverse feature space for synthetic images without class guidance. Extensive experiments across nine downstream tasks validate the effectiveness of InvCoSS, achieving performance comparable to or even superior to prior data-replay methods while significantly reducing storage requirements and eliminating data privacy constraints.
Related papers
- Privacy-Aware Continual Self-Supervised Learning on Multi-Window Chest Computed Tomography for Domain-Shift Robustness [38.350720506451104]
We propose a novel continual self-supervised learning (CSSL) framework for simultaneously learning diverse features from chest computed tomography (CT) images.<n>We introduce a feature distillation technique that integrates Wasserstein distance-based knowledge distillation (WKD) and batch-knowledge ensemble (BKE) to enhance the ability of the model to learn meaningful, domain-shift-robust representations.
arXiv Detail & Related papers (2025-10-31T06:16:31Z) - LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling [38.700993166492495]
We propose a dataset-free, and unified approach through recurrent posterior sampling utilizing a pretrained latent diffusion model.<n>Our method incorporates the multimodal understanding model to provide sematic priors for the generative model under a task-blind condition.
arXiv Detail & Related papers (2025-07-01T14:25:09Z) - Synthetic Data is an Elegant GIFT for Continual Vision-Language Models [52.343627275005026]
GIFT is a novel continual fine-tuning approach to overcome catastrophic forgetting in Vision-Language Models.<n>We employ a pre-trained diffusion model to recreate both pre-training and learned downstream task data.<n>Our method consistently outperforms previous state-of-the-art approaches across various settings.
arXiv Detail & Related papers (2025-03-06T09:09:18Z) - Re-Visible Dual-Domain Self-Supervised Deep Unfolding Network for MRI Reconstruction [48.30341580103962]
We propose a novel re-visible dual-domain self-supervised deep unfolding network to address these issues.<n>We design a deep unfolding network based on Chambolle and Pock Proximal Point Algorithm (DUN-CP-PPA) to achieve end-to-end reconstruction.<n> Experiments conducted on the fastMRI and IXI datasets demonstrate that our method significantly outperforms state-of-the-art approaches in terms of reconstruction performance.
arXiv Detail & Related papers (2025-01-07T12:29:32Z) - FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration [66.61201445650323]
Existing methods suffer from a generalization bottleneck in real-world scenarios.<n>We contribute a million-scale dataset with two notable advantages over existing training data.<n>We propose a robust model, FoundIR, to better address a broader range of restoration tasks in real-world scenarios.
arXiv Detail & Related papers (2024-12-02T12:08:40Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Progressive trajectory matching for medical dataset distillation [15.116863763717623]
It is essential but challenging to share medical image datasets due to privacy issues.
We propose a novel dataset distillation method to condense the original medical image datasets into a synthetic one.
arXiv Detail & Related papers (2024-03-20T10:18:20Z) - Bridging Synthetic and Real Images: a Transferable and Multiple
Consistency aided Fundus Image Enhancement Framework [61.74188977009786]
We propose an end-to-end optimized teacher-student framework to simultaneously conduct image enhancement and domain adaptation.
We also propose a novel multi-stage multi-attention guided enhancement network (MAGE-Net) as the backbones of our teacher and student network.
arXiv Detail & Related papers (2023-02-23T06:16:15Z) - Federated Learning of Generative Image Priors for MRI Reconstruction [5.3963856146595095]
Multi-institutional efforts can facilitate training of deep MRI reconstruction models, albeit privacy risks arise during cross-site sharing of imaging data.
We introduce a novel method for MRI reconstruction based on Federated learning of Generative IMage Priors (FedGIMP)
FedGIMP leverages a two-stage approach: cross-site learning of a generative MRI prior, and subject-specific injection of the imaging operator.
arXiv Detail & Related papers (2022-02-08T22:17:57Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.