Prompt learning with bounding box constraints for medical image segmentation
- URL: http://arxiv.org/abs/2507.02743v1
- Date: Thu, 03 Jul 2025 16:04:08 GMT
- Title: Prompt learning with bounding box constraints for medical image segmentation
- Authors: Mélanie Gaillochet, Mehrdad Noori, Sahar Dastani, Christian Desrosiers, Hervé Lombaert,
- Abstract summary: Vision foundation models have recently shown noteworthy segmentation performance when provided with prompts such as points or bounding boxes.<n>This paper proposes a novel framework that combines the representational power of foundation models with the annotation efficiency of weakly supervised segmentation.<n>Our method achieves an average Dice score of 84.90% in a limited data setting, outperforming existing fully-supervised and weakly-supervised approaches.
- Score: 9.429796437031577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pixel-wise annotations are notoriously labourious and costly to obtain in the medical domain. To mitigate this burden, weakly supervised approaches based on bounding box annotations-much easier to acquire-offer a practical alternative. Vision foundation models have recently shown noteworthy segmentation performance when provided with prompts such as points or bounding boxes. Prompt learning exploits these models by adapting them to downstream tasks and automating segmentation, thereby reducing user intervention. However, existing prompt learning approaches depend on fully annotated segmentation masks. This paper proposes a novel framework that combines the representational power of foundation models with the annotation efficiency of weakly supervised segmentation. More specifically, our approach automates prompt generation for foundation models using only bounding box annotations. Our proposed optimization scheme integrates multiple constraints derived from box annotations with pseudo-labels generated by the prompted foundation model. Extensive experiments across multimodal datasets reveal that our weakly supervised method achieves an average Dice score of 84.90% in a limited data setting, outperforming existing fully-supervised and weakly-supervised approaches. The code is available at https://github.com/Minimel/box-prompt-learning-VFM.git
Related papers
- Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation [0.8391490466934672]
Scribble-supervised learning emerges as a possible solution to this challenge, promising a reduction in annotation efforts when creating large-scale datasets.
We propose a benchmark consisting of seven datasets covering a diverse set of anatomies and pathologies imaged with varying modalities.
Our evaluation using nnU-Net reveals that while most existing methods suffer from a lack of generalization, the proposed approach consistently delivers state-of-the-art performance.
arXiv Detail & Related papers (2024-03-19T15:41:16Z) - A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Learning Referring Video Object Segmentation from Weak Annotation [78.45828085350936]
Referring video object segmentation (RVOS) is a task that aims to segment the target object in all video frames based on a sentence describing the object.
We propose a new annotation scheme that reduces the annotation effort by 8 times, while providing sufficient supervision for RVOS.
Our scheme only requires a mask for the frame where the object first appears and bounding boxes for the rest of the frames.
arXiv Detail & Related papers (2023-08-04T06:50:52Z) - Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual
Mask Annotations [86.47908754383198]
Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories.
Our method generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs.
Our method trained with just pseudo-masks significantly improves the mAP scores on the MS-COCO dataset and OpenImages dataset.
arXiv Detail & Related papers (2023-03-29T17:58:39Z) - Point Cloud Instance Segmentation with Semi-supervised Bounding-Box
Mining [17.69745159912481]
We introduce the first semi-supervised point cloud instance segmentation framework (SPIB) using both labeled and unlabelled bounding boxes as supervision.
Our method can achieve competitive performance compared with the recent fully-supervised methods.
arXiv Detail & Related papers (2021-11-30T08:40:40Z) - Boosting Weakly Supervised Object Detection via Learning Bounding Box
Adjusters [76.36104006511684]
Weakly-supervised object detection (WSOD) has emerged as an inspiring recent topic to avoid expensive instance-level object annotations.
We defend the problem setting for improving localization performance by leveraging the bounding box regression knowledge from a well-annotated auxiliary dataset.
Our method performs favorably against state-of-the-art WSOD methods and knowledge transfer model with similar problem setting.
arXiv Detail & Related papers (2021-08-03T13:38:20Z) - Towards Using Count-level Weak Supervision for Crowd Counting [55.58468947486247]
This paper studies the problem of weakly-supervised crowd counting which learns a model from only a small amount of location-level annotations (fully-supervised) but a large amount of count-level annotations (weakly-supervised)
We devise a simple-yet-effective training strategy, namely Multiple Auxiliary Tasks Training (MATT), to construct regularizes for restricting the freedom of the generated density maps.
arXiv Detail & Related papers (2020-02-29T02:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.