Related papers: Learning to Segment from Scribbles using Multi-scale Adversarial Attention Gates

Learning to Segment from Scribbles using Multi-scale Adversarial Attention Gates

URL: http://arxiv.org/abs/2007.01152v3
Date: Thu, 25 Mar 2021 15:54:11 GMT
Title: Learning to Segment from Scribbles using Multi-scale Adversarial Attention Gates
Authors: Gabriele Valvano, Andrea Leo, Sotirios A. Tsaftaris
Abstract summary: Weakly-supervised learning can train models by relying on weaker forms of annotation, such as scribbles. We train a multi-scale GAN to generate realistic segmentation masks at multiple resolutions, while we use scribbles to learn their correct position in the image. Central to the model's success is a novel attention gating mechanism, which we condition with adversarial signals to act as a shape prior.
Score: 16.28285034098361
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large, fine-grained image segmentation datasets, annotated at pixel-level, are difficult to obtain, particularly in medical imaging, where annotations also require expert knowledge. Weakly-supervised learning can train models by relying on weaker forms of annotation, such as scribbles. Here, we learn to segment using scribble annotations in an adversarial game. With unpaired segmentation masks, we train a multi-scale GAN to generate realistic segmentation masks at multiple resolutions, while we use scribbles to learn their correct position in the image. Central to the model's success is a novel attention gating mechanism, which we condition with adversarial signals to act as a shape prior, resulting in better object localization at multiple scales. Subject to adversarial conditioning, the segmentor learns attention maps that are semantic, suppress the noisy activations outside the objects, and reduce the vanishing gradient problem in the deeper layers of the segmentor. We evaluated our model on several medical (ACDC, LVSC, CHAOS) and non-medical (PPSS) datasets, and we report performance levels matching those achieved by models trained with fully annotated segmentation masks. We also demonstrate extensions in a variety of settings: semi-supervised learning; combining multiple scribble sources (a crowdsourcing scenario) and multi-task learning (combining scribble and mask supervision). We release expert-made scribble annotations for the ACDC dataset, and the code used for the experiments, at https://vios-s.github.io/multiscale-adversarial-attention-gates

Related papers

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets. We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation. Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z)
Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
In-context segmentation aims to segment objects using given reference images. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. This work approaches the problem from a fresh perspective - unlocking the capability of the latent diffusion model for in-context segmentation.
arXiv Detail & Related papers (2024-03-14T17:52:31Z)
SLiMe: Segment Like Me [24.254744102347413]
We propose SLiMe to segment images at any desired granularity using as few as one annotated sample. We carried out a knowledge-rich set of experiments examining various design factors and showed that SLiMe outperforms other existing one-shot and few-shot segmentation methods.
arXiv Detail & Related papers (2023-09-06T17:39:05Z)
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks. We propose a single-stage and standalone method, MOCA, which unifies both desired properties. We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z)
A Self-Training Framework Based on Multi-Scale Attention Fusion for Weakly Supervised Semantic Segmentation [7.36778096476552]
We propose a self-training method that utilizes fused multi-scale class-aware attention maps. We collect information from attention maps of different scales and obtain multi-scale attention maps. We then apply denoising and reactivation strategies to enhance the potential regions and reduce noisy areas.
arXiv Detail & Related papers (2023-05-10T02:16:12Z)
Self-attention on Multi-Shifted Windows for Scene Segmentation [14.47974086177051]
We explore the effective use of self-attention within multi-scale image windows to learn descriptive visual features. We propose three different strategies to aggregate these feature maps to decode the feature representation for dense prediction. Our models achieve very promising performance on four public scene segmentation datasets.
arXiv Detail & Related papers (2022-07-10T07:36:36Z)
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization [98.46318529630109]
We take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem. We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene. By clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions.
arXiv Detail & Related papers (2022-05-16T17:47:44Z)
Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
Deep Active Learning for Joint Classification & Segmentation with Weak Annotator [22.271760669551817]
CNN visualization and interpretation methods, like class-activation maps (CAMs), are typically used to highlight the image regions linked to class predictions. We propose an active learning framework, which progressively integrates pixel-level annotations during training. Our results indicate that, by simply using random sample selection, the proposed approach can significantly outperform state-of-the-art CAMs and AL methods.
arXiv Detail & Related papers (2020-10-10T03:25:54Z)
Semi-supervised few-shot learning for medical image segmentation [21.349705243254423]
Recent attempts to alleviate the need for large annotated datasets have developed training strategies under the few-shot learning paradigm. We propose a novel few-shot learning framework for semantic segmentation, where unlabeled images are also made available at each episode. We show that including unlabeled surrogate tasks in the episodic training leads to more powerful feature representations.
arXiv Detail & Related papers (2020-03-18T20:37:18Z)
Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario. We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.