Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning
- URL: http://arxiv.org/abs/2503.14958v1
- Date: Wed, 19 Mar 2025 07:51:14 GMT
- Title: Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning
- Authors: Zixuan Zheng, Yilei Shi, Chunlei Li, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou,
- Abstract summary: We investigate an extremely low-data regime that utilizes from only a few video frames and leverages existing labeled images to minimize costly video annotations.<n>Our model bridges the gap between abundant annotated medical images and scarce, sparsely labeled medical videos to achieve strong video segmentation performance in this low data regime.
- Score: 20.458912966915843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot video object segmentation aims to reduce annotation costs; however, existing methods still require abundant dense frame annotations for training, which are scarce in the medical domain. We investigate an extremely low-data regime that utilizes annotations from only a few video frames and leverages existing labeled images to minimize costly video annotations. Specifically, we propose a two-phase framework. First, we learn a few-shot segmentation model using labeled images. Subsequently, to improve performance without full supervision, we introduce a spatiotemporal consistency relearning approach on medical videos that enforces consistency between consecutive frames. Constraints are also enforced between the image model and relearning model at both feature and prediction levels. Experiments demonstrate the superiority of our approach over state-of-the-art few-shot segmentation methods. Our model bridges the gap between abundant annotated medical images and scarce, sparsely labeled medical videos to achieve strong video segmentation performance in this low data regime. Code is available at https://github.com/MedAITech/RAB.
Related papers
- Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching [4.1253497486581026]
We conceptualize 3D medical image volumes as video sequences, departing from the traditional slice-by-slice paradigm.<n>We perform extensive data augmentation on a single labeled support image and, for each frame in the query volume, algorithmically select the most analogous augmented support image.<n>We demonstrate state-of-the-art performance on benchmark few-shot medical image segmentation datasets, achieving significant improvements in accuracy and annotation efficiency.
arXiv Detail & Related papers (2025-03-05T06:12:13Z) - Is Two-shot All You Need? A Label-efficient Approach for Video
Segmentation in Breast Ultrasound [4.113689581316844]
We propose a novel two-shot training paradigm for BUS video segmentation.
It not only is able to capture free-range space-time consistency but also utilizes a source-dependent augmentation scheme.
Results showed that it gained comparable performance to the fully annotated ones given only 1.9% training labels.
arXiv Detail & Related papers (2024-02-07T14:47:08Z) - DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance [69.0740091741732]
We propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo.
Our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.
arXiv Detail & Related papers (2023-12-05T03:16:31Z) - Correlation-aware active learning for surgery video segmentation [13.327429312047396]
This work proposes a novel AL strategy for surgery video segmentation, COWAL, COrrelation-aWare Active Learning.
Our approach involves projecting images into a latent space that has been fine-tuned using contrastive learning and then selecting a fixed number of representative images from local clusters of video frames.
We demonstrate the effectiveness of this approach on two video datasets of surgical instruments and three real-world video datasets.
arXiv Detail & Related papers (2023-11-15T09:30:52Z) - Time Does Tell: Self-Supervised Time-Tuning of Dense Image
Representations [79.87044240860466]
We propose a novel approach that incorporates temporal consistency in dense self-supervised learning.
Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos.
Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images.
arXiv Detail & Related papers (2023-08-22T21:28:58Z) - Few Shot Medical Image Segmentation with Cross Attention Transformer [30.54965157877615]
We propose a novel framework for few-shot medical image segmentation, termed CAT-Net.
Our proposed network mines the correlations between the support image and query image, limiting them to focus only on useful foreground information.
We validated the proposed method on three public datasets: Abd-CT, Abd-MRI, and Card-MRI.
arXiv Detail & Related papers (2023-03-24T09:10:14Z) - Weakly-Supervised Action Detection Guided by Audio Narration [50.4318060593995]
We propose a model to learn from the narration supervision and utilize multimodal features, including RGB, motion flow, and ambient sound.
Our experiments show that noisy audio narration suffices to learn a good action detection model, thus reducing annotation expenses.
arXiv Detail & Related papers (2022-05-12T06:33:24Z) - Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z) - Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration.
This enables the learning of long-range dependencies beyond a single clip.
Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z) - Towards Unsupervised Learning for Instrument Segmentation in Robotic
Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation.
Our approach allows to train image segmentation models without the need to acquire expensive annotations.
We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.