Related papers: Weakly Supervised Segmentation of Hyper-Reflective Foci with Compact Convolutional Transformers and SAM2

Weakly Supervised Segmentation of Hyper-Reflective Foci with Compact Convolutional Transformers and SAM2

URL: http://arxiv.org/abs/2501.05933v1
Date: Fri, 10 Jan 2025 12:56:18 GMT
Title: Weakly Supervised Segmentation of Hyper-Reflective Foci with Compact Convolutional Transformers and SAM2
Authors: Olivier Morelle, Justus Bisten, Maximilian W. M. Wintergerst, Robert P. Finger, Thomas Schultz,
Abstract summary: We propose a novel framework that increases spatial resolution of a traditional attention-based Multiple Instance Learning (MIL) approach.<n>We demonstrate that replacing MIL with a Compact Convolutional Transformer (CCT) leads to a substantial increase in segmentation accuracy.
Score: 0.7340017786387767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weakly supervised segmentation has the potential to greatly reduce the annotation effort for training segmentation models for small structures such as hyper-reflective foci (HRF) in optical coherence tomography (OCT). However, most weakly supervised methods either involve a strong downsampling of input images, or only achieve localization at a coarse resolution, both of which are unsatisfactory for small structures. We propose a novel framework that increases the spatial resolution of a traditional attention-based Multiple Instance Learning (MIL) approach by using Layer-wise Relevance Propagation (LRP) to prompt the Segment Anything Model (SAM~2), and increases recall with iterative inference. Moreover, we demonstrate that replacing MIL with a Compact Convolutional Transformer (CCT), which adds a positional encoding, and permits an exchange of information between different regions of the OCT image, leads to a further and substantial increase in segmentation accuracy.

Related papers

Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion [12.839049648094893]
coronary artery segmentation is critical for computeraided diagnosis of coronary artery disease (CAD)<n>We propose a novel framework that leverages the power of vision foundation models (VFMs) through a parallel encoding architecture.<n>The proposed framework significantly outperforms state-of-the-art methods, achieving superior performance in accurate coronary artery segmentation.
arXiv Detail & Related papers (2025-07-17T09:25:00Z)
Cross Paradigm Representation and Alignment Transformer for Image Deraining [40.66823807648992]
We propose a novel Cross Paradigm Representation and Alignment Transformer (CPRAformer) Its core idea is the hierarchical representation and alignment, leveraging the strengths of both paradigms to aid image reconstruction. We use two types of self-attention in the Transformer blocks: sparse prompt channel self-attention (SPC-SA) and spatial pixel refinement self-attention (SPR-SA)
arXiv Detail & Related papers (2025-04-23T06:44:46Z)
Enhanced High-Dimensional Data Visualization through Adaptive Multi-Scale Manifold Embedding [0.7705234721762716]
We propose an Adaptive Multi-Scale Manifold Embedding (AMSME) algorithm. By introducing ordinal distance, we demonstrate that ordinal distance overcomes the constraints of the curse of dimensionality in high-dimensional spaces. Experimental results demonstrate that AMSME significantly preserves intra-cluster topological structures and improves inter-cluster separation on real-world datasets.
arXiv Detail & Related papers (2025-03-18T06:46:53Z)
Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.<n>Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone. Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z)
Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z)
Improving Misaligned Multi-modality Image Fusion with One-stage Progressive Dense Registration [67.23451452670282]
Misalignments between multi-modality images pose challenges in image fusion. We propose a Cross-modality Multi-scale Progressive Dense Registration scheme. This scheme accomplishes the coarse-to-fine registration exclusively using a one-stage optimization.
arXiv Detail & Related papers (2023-08-22T03:46:24Z)
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation. We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z)
LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression [27.02281402358164]
We propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression. We introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
arXiv Detail & Related papers (2023-04-19T11:19:10Z)
IDEAL: Improved DEnse locAL Contrastive Learning for Semi-Supervised Medical Image Segmentation [3.6748639131154315]
We extend the concept of metric learning to the segmentation task. We propose a simple convolutional projection head for obtaining dense pixel-level features. A bidirectional regularization mechanism involving two-stream regularization training is devised for the downstream task.
arXiv Detail & Related papers (2022-10-26T23:11:02Z)
Deformer: Towards Displacement Field Learning for Unsupervised Medical Image Registration [28.358693013757865]
We propose a novel Deformer module along with a multi-scale framework for the deformable image registration task. The Deformer module is designed to facilitate the mapping from image representation to spatial transformation. With the multi-scale framework to predict the displacement fields in a coarse-to-fine manner, superior performance can be achieved.
arXiv Detail & Related papers (2022-07-07T09:14:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.