Related papers: Cycle Context Verification for In-Context Medical Image Segmentation

Cycle Context Verification for In-Context Medical Image Segmentation

URL: http://arxiv.org/abs/2507.08357v1
Date: Fri, 11 Jul 2025 07:18:01 GMT
Title: Cycle Context Verification for In-Context Medical Image Segmentation
Authors: Shishuai Hu, Zehui Liao, Liangli Zhen, Huazhu Fu, Yong Xia,
Abstract summary: In-context learning (ICL) is emerging as a promising technique for achieving universal medical image segmentation.<n>In a clinical scenario, the scarcity of annotated medical images makes it challenging to select optimal in-context pairs.<n>We propose Cycle Context Verification (CCV), a novel framework that enhances ICL-based medical image segmentation.
Score: 43.416111396585165
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In-context learning (ICL) is emerging as a promising technique for achieving universal medical image segmentation, where a variety of objects of interest across imaging modalities can be segmented using a single model. Nevertheless, its performance is highly sensitive to the alignment between the query image and in-context image-mask pairs. In a clinical scenario, the scarcity of annotated medical images makes it challenging to select optimal in-context pairs, and fine-tuning foundation ICL models on contextual data is infeasible due to computational costs and the risk of catastrophic forgetting. To address this challenge, we propose Cycle Context Verification (CCV), a novel framework that enhances ICL-based medical image segmentation by enabling self-verification of predictions and accordingly enhancing contextual alignment. Specifically, CCV employs a cyclic pipeline in which the model initially generates a segmentation mask for the query image. Subsequently, the roles of the query and an in-context pair are swapped, allowing the model to validate its prediction by predicting the mask of the original in-context image. The accuracy of this secondary prediction serves as an implicit measure of the initial query segmentation. A query-specific prompt is introduced to alter the query image and updated to improve the measure, thereby enhancing the alignment between the query and in-context pairs. We evaluated CCV on seven medical image segmentation datasets using two ICL foundation models, demonstrating its superiority over existing methods. Our results highlight CCV's ability to enhance ICL-based segmentation, making it a robust solution for universal medical image segmentation. The code will be available at https://github.com/ShishuaiHu/CCV.

Related papers

MAMBO-NET: Multi-Causal Aware Modeling Backdoor-Intervention Optimization for Medical Image Segmentation Network [51.68708264694361]
Confusion factors can affect medical images, such as complex anatomical variations and imaging modality limitations.<n>We propose a multi-causal aware modeling backdoor-intervention optimization network for medical image segmentation.<n>Our method significantly reduces the influence of confusion factors, leading to enhanced segmentation accuracy.
arXiv Detail & Related papers (2025-05-28T01:40:10Z)
AutoMiSeg: Automatic Medical Image Segmentation via Test-Time Adaptation of Foundation Models [7.382887784956608]
This paper introduces a zero-shot and automatic segmentation pipeline that combines vision-language and segmentation foundation models.<n>By proper decomposition and test-time adaptation, our fully automatic pipeline performs competitively with weakly-prompted interactive foundation models.
arXiv Detail & Related papers (2025-05-23T14:07:21Z)
CLIP-IT: CLIP-based Pairing for Histology Images Classification [6.5280377968471]
Multimodal learning has shown promise in medical image analysis, combining complementary modalities like histology images and text.<n>We introduce CLIP-IT, a novel framework that relies on rich unpaired text reports, eliminating paired data requirement.<n> Experiments on histology image datasets confirm that CLIP-IT consistently improves classification accuracy over both unimodal and multimodal CLIP-based baselines.
arXiv Detail & Related papers (2025-04-22T18:14:43Z)
CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention [30.501326915750898]
We propose CausalCLIPSeg, an end-to-end framework for referring medical image segmentation.<n>Despite not being trained on medical data, we enforce CLIP's rich semantic space onto the medical domain.<n>To mitigate confounding bias that may cause the model to learn spurious correlations, CausalCLIPSeg introduces a causal intervention module.
arXiv Detail & Related papers (2025-03-20T08:46:24Z)
FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms [60.195642571004804]
We introduce FlowSDF, an image-guided conditional flow matching framework, to represent an implicit distribution of segmentation masks.<n>Our framework enables accurate sampling of segmentation masks and the computation of relevant statistical measures.
arXiv Detail & Related papers (2024-05-28T11:47:12Z)
Few-shot Medical Image Segmentation via Cross-Reference Transformer [3.2634122554914]
Few-shot segmentation(FSS) has the potential to address these challenges by learning new categories from a small number of labeled samples. We propose a novel self-supervised few shot medical image segmentation network with Cross-Reference Transformer. Experimental results show that the proposed model achieves good results on both CT dataset and MRI dataset.
arXiv Detail & Related papers (2023-04-19T13:05:18Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. We explicitly account for prior images and reports when available during both training and fine-tuning. Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z)
Data-Limited Tissue Segmentation using Inpainting-Based Self-Supervised Learning [3.7931881761831328]
Self-supervised learning (SSL) methods involving pretext tasks have shown promise in overcoming this requirement by first pretraining models using unlabeled data. We evaluate the efficacy of two SSL methods (inpainting-based pretext tasks of context prediction and context restoration) for CT and MRI image segmentation in label-limited scenarios. We demonstrate that optimally trained and easy-to-implement SSL segmentation models can outperform classically supervised methods for MRI and CT tissue segmentation in label-limited scenarios.
arXiv Detail & Related papers (2022-10-14T16:34:05Z)
A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation. In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP. Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.