SSG2: A new modelling paradigm for semantic segmentation
- URL: http://arxiv.org/abs/2310.08671v1
- Date: Thu, 12 Oct 2023 19:08:03 GMT
- Title: SSG2: A new modelling paradigm for semantic segmentation
- Authors: Foivos I. Diakogiannis, Suzanne Furby, Peter Caccetta, Xiaoliang Wu,
Rodrigo Ibata, Ondrej Hlinka, John Taylor
- Abstract summary: State-of-the-art models in semantic segmentation operate on single, static images, generating corresponding segmentation masks.
Inspired by work on semantic change detection, we introduce a methodology that leverages a sequence of observables generated for each static input image.
By adding this "temporal" dimension, we exploit strong signal correlations between successive observations in the sequence to reduce error rates.
We evaluate SSG2 across three diverse datasets: UrbanMonitor, featuring orthoimage tiles from Darwin, Australia with five spectral bands and 0.2m spatial resolution; ISPRS Potsdam, which includes true orthophoto images with multiple spectral bands and a 5cm ground sampling
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art models in semantic segmentation primarily operate on single,
static images, generating corresponding segmentation masks. This one-shot
approach leaves little room for error correction, as the models lack the
capability to integrate multiple observations for enhanced accuracy. Inspired
by work on semantic change detection, we address this limitation by introducing
a methodology that leverages a sequence of observables generated for each
static input image. By adding this "temporal" dimension, we exploit strong
signal correlations between successive observations in the sequence to reduce
error rates. Our framework, dubbed SSG2 (Semantic Segmentation Generation 2),
employs a dual-encoder, single-decoder base network augmented with a sequence
model. The base model learns to predict the set intersection, union, and
difference of labels from dual-input images. Given a fixed target input image
and a set of support images, the sequence model builds the predicted mask of
the target by synthesizing the partial views from each sequence step and
filtering out noise. We evaluate SSG2 across three diverse datasets:
UrbanMonitor, featuring orthoimage tiles from Darwin, Australia with five
spectral bands and 0.2m spatial resolution; ISPRS Potsdam, which includes true
orthophoto images with multiple spectral bands and a 5cm ground sampling
distance; and ISIC2018, a medical dataset focused on skin lesion segmentation,
particularly melanoma. The SSG2 model demonstrates rapid convergence within the
first few tens of epochs and significantly outperforms UNet-like baseline
models with the same number of gradient updates. However, the addition of the
temporal dimension results in an increased memory footprint. While this could
be a limitation, it is offset by the advent of higher-memory GPUs and coding
optimizations.
Related papers
- SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting [49.87694319431288]
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources.
We propose a Comprehensive Generative (CGR) framework that restores appearance and semantic knowledge by synthesizing image-mask pairs.
Experiments on incremental tasks (cardiac, fundus and prostate segmentation) show its clear advantage for alleviating concurrent appearance and semantic forgetting.
arXiv Detail & Related papers (2024-06-28T10:05:58Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - Robust One-shot Segmentation of Brain Tissues via Image-aligned Style
Transformation [13.430851964063534]
We propose a novel image-aligned style transformation to reinforce the dual-model iterative learning for one-shot segmentation of brain tissues.
Experimental results on two public datasets demonstrate 1) a competitive segmentation performance of our method compared to the fully-supervised method, and 2) a superior performance over other state-of-the-art with an increase of average Dice by up to 4.67%.
arXiv Detail & Related papers (2022-11-26T09:14:01Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - On the Texture Bias for Few-Shot CNN Segmentation [21.349705243254423]
Convolutional Neural Networks (CNNs) are driven by shapes to perform visual recognition tasks.
Recent evidence suggests texture bias in CNNs provides higher performing models when learning on large labeled training datasets.
We propose a novel architecture that integrates a set of Difference of Gaussians (DoG) to attenuate high-frequency local components in the feature space.
arXiv Detail & Related papers (2020-03-09T11:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.