Weakly Supervised Segmentation of Hyper-Reflective Foci with Compact Convolutional Transformers and SAM2
- URL: http://arxiv.org/abs/2501.05933v1
- Date: Fri, 10 Jan 2025 12:56:18 GMT
- Title: Weakly Supervised Segmentation of Hyper-Reflective Foci with Compact Convolutional Transformers and SAM2
- Authors: Olivier Morelle, Justus Bisten, Maximilian W. M. Wintergerst, Robert P. Finger, Thomas Schultz,
- Abstract summary: We propose a novel framework that increases spatial resolution of a traditional attention-based Multiple Instance Learning (MIL) approach.
We demonstrate that replacing MIL with a Compact Convolutional Transformer (CCT) leads to a substantial increase in segmentation accuracy.
- Score: 0.7340017786387767
- License:
- Abstract: Weakly supervised segmentation has the potential to greatly reduce the annotation effort for training segmentation models for small structures such as hyper-reflective foci (HRF) in optical coherence tomography (OCT). However, most weakly supervised methods either involve a strong downsampling of input images, or only achieve localization at a coarse resolution, both of which are unsatisfactory for small structures. We propose a novel framework that increases the spatial resolution of a traditional attention-based Multiple Instance Learning (MIL) approach by using Layer-wise Relevance Propagation (LRP) to prompt the Segment Anything Model (SAM~2), and increases recall with iterative inference. Moreover, we demonstrate that replacing MIL with a Compact Convolutional Transformer (CCT), which adds a positional encoding, and permits an exchange of information between different regions of the OCT image, leads to a further and substantial increase in segmentation accuracy.
Related papers
- Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z) - CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine
Context-Guided Motion Reasoning [1.0855602842179624]
We propose CCMR: a high-resolution coarse-to-fine approach that leverages attention-based motion grouping concepts to multi-scale optical flow estimation.
CCMR relies on a hierarchical two-step attention-based context-motion grouping strategy.
Experiments and ablations demonstrate that our efforts of combining multi-scale and attention-based concepts pay off.
arXiv Detail & Related papers (2023-11-05T14:14:24Z) - Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Improving Misaligned Multi-modality Image Fusion with One-stage
Progressive Dense Registration [67.23451452670282]
Misalignments between multi-modality images pose challenges in image fusion.
We propose a Cross-modality Multi-scale Progressive Dense Registration scheme.
This scheme accomplishes the coarse-to-fine registration exclusively using a one-stage optimization.
arXiv Detail & Related papers (2023-08-22T03:46:24Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression [27.02281402358164]
We propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression.
We introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity.
Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
arXiv Detail & Related papers (2023-04-19T11:19:10Z) - IDEAL: Improved DEnse locAL Contrastive Learning for Semi-Supervised
Medical Image Segmentation [3.6748639131154315]
We extend the concept of metric learning to the segmentation task.
We propose a simple convolutional projection head for obtaining dense pixel-level features.
A bidirectional regularization mechanism involving two-stream regularization training is devised for the downstream task.
arXiv Detail & Related papers (2022-10-26T23:11:02Z) - Deformer: Towards Displacement Field Learning for Unsupervised Medical
Image Registration [28.358693013757865]
We propose a novel Deformer module along with a multi-scale framework for the deformable image registration task.
The Deformer module is designed to facilitate the mapping from image representation to spatial transformation.
With the multi-scale framework to predict the displacement fields in a coarse-to-fine manner, superior performance can be achieved.
arXiv Detail & Related papers (2022-07-07T09:14:40Z) - Activation Modulation and Recalibration Scheme for Weakly Supervised
Semantic Segmentation [24.08326440298189]
We propose a novel activation modulation and recalibration scheme for weakly supervised semantic segmentation.
We show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset.
Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance.
arXiv Detail & Related papers (2021-12-16T16:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.