Keypoint-Augmented Self-Supervised Learning for Medical Image
Segmentation with Limited Annotation
- URL: http://arxiv.org/abs/2310.01680v2
- Date: Wed, 18 Oct 2023 18:53:11 GMT
- Title: Keypoint-Augmented Self-Supervised Learning for Medical Image
Segmentation with Limited Annotation
- Authors: Zhangsihao Yang, Mengwei Ren, Kaize Ding, Guido Gerig, Yalin Wang
- Abstract summary: We present a keypointaugmented fusion layer that extracts representations preserving both short- and long-range self-attention.
In particular, we augment the CNN feature map at multiple scales by incorporating an additional input that learns long-range spatial selfattention.
Our method further outperforms existing SSL methods by producing more robust self-attention.
- Score: 21.203307064937142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretraining CNN models (i.e., UNet) through self-supervision has become a
powerful approach to facilitate medical image segmentation under low annotation
regimes. Recent contrastive learning methods encourage similar global
representations when the same image undergoes different transformations, or
enforce invariance across different image/patch features that are intrinsically
correlated. However, CNN-extracted global and local features are limited in
capturing long-range spatial dependencies that are essential in biological
anatomy. To this end, we present a keypoint-augmented fusion layer that
extracts representations preserving both short- and long-range self-attention.
In particular, we augment the CNN feature map at multiple scales by
incorporating an additional input that learns long-range spatial self-attention
among localized keypoint features. Further, we introduce both global and local
self-supervised pretraining for the framework. At the global scale, we obtain
global representations from both the bottleneck of the UNet, and by aggregating
multiscale keypoint features. These global features are subsequently
regularized through image-level contrastive objectives. At the local scale, we
define a distance-based criterion to first establish correspondences among
keypoints and encourage similarity between their features. Through extensive
experiments on both MRI and CT segmentation tasks, we demonstrate the
architectural advantages of our proposed method in comparison to both CNN and
Transformer-based UNets, when all architectures are trained with randomly
initialized weights. With our proposed pretraining strategy, our method further
outperforms existing SSL methods by producing more robust self-attention and
achieving state-of-the-art segmentation results. The code is available at
https://github.com/zshyang/kaf.git.
Related papers
- Siamese Transformer Networks for Few-shot Image Classification [9.55588609556447]
Humans exhibit remarkable proficiency in visual classification tasks, accurately recognizing and classifying new images with minimal examples.
Existing few-shot image classification methods often emphasize either global features or local features, with few studies considering the integration of both.
We propose a novel approach based on the Siamese Transformer Network (STN)
Our strategy effectively harnesses the potential of global and local features in few-shot image classification, circumventing the need for complex feature adaptation modules.
arXiv Detail & Related papers (2024-07-16T14:27:23Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - IAUnet: Global Context-Aware Feature Learning for Person
Re-Identification [106.50534744965955]
IAU block enables the feature to incorporate the globally spatial, temporal, and channel context.
It is lightweight, end-to-end trainable, and can be easily plugged into existing CNNs to form IAUnet.
Experiments show that IAUnet performs favorably against state-of-the-art on both image and video reID tasks.
arXiv Detail & Related papers (2020-09-02T13:07:10Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - Attentive CutMix: An Enhanced Data Augmentation Approach for Deep
Learning Based Image Classification [58.20132466198622]
We propose Attentive CutMix, a naturally enhanced augmentation strategy based on CutMix.
In each training iteration, we choose the most descriptive regions based on the intermediate attention maps from a feature extractor.
Our proposed method is simple yet effective, easy to implement and can boost the baseline significantly.
arXiv Detail & Related papers (2020-03-29T15:01:05Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.