Transformer-CNN Cohort: Semi-supervised Semantic Segmentation by the
Best of Both Students
- URL: http://arxiv.org/abs/2209.02178v2
- Date: Sun, 17 Dec 2023 01:31:57 GMT
- Title: Transformer-CNN Cohort: Semi-supervised Semantic Segmentation by the
Best of Both Students
- Authors: Xu Zheng, Yunhao Luo, Chong Fu, Kangcheng Liu and Lin Wang
- Abstract summary: We propose a novel Semi-supervised Learning (SSL) approach that consists of two students with one based on the vision transformer (ViT) and the other based on the convolutional neural network (CNN)
Our method subtly incorporates the multi-level consistency regularization on the predictions and the heterogeneous feature spaces via pseudo-labeling for the unlabeled data.
We validate the TCC framework on Cityscapes and Pascal VOC 2012 datasets, which outperforms existing SSL methods by a large margin.
- Score: 18.860732413631887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The popular methods for semi-supervised semantic segmentation mostly adopt a
unitary network model using convolutional neural networks (CNNs) and enforce
consistency of the model's predictions over perturbations applied to the inputs
or model. However, such a learning paradigm suffers from two critical
limitations: a) learning the discriminative features for the unlabeled data; b)
learning both global and local information from the whole image. In this paper,
we propose a novel Semi-supervised Learning (SSL) approach, called
Transformer-CNN Cohort (TCC), that consists of two students with one based on
the vision transformer (ViT) and the other based on the CNN. Our method subtly
incorporates the multi-level consistency regularization on the predictions and
the heterogeneous feature spaces via pseudo-labeling for the unlabeled data.
First, as the inputs of the ViT student are image patches, the feature maps
extracted encode crucial class-wise statistics. To this end, we propose
class-aware feature consistency distillation (CFCD) that first leverages the
outputs of each student as the pseudo labels and generates class-aware feature
(CF) maps for knowledge transfer between the two students. Second, as the ViT
student has more uniform representations for all layers, we propose
consistency-aware cross distillation (CCD) to transfer knowledge between the
pixel-wise predictions from the cohort. We validate the TCC framework on
Cityscapes and Pascal VOC 2012 datasets, which outperforms existing SSL methods
by a large margin.
Related papers
- Intrapartum Ultrasound Image Segmentation of Pubic Symphysis and Fetal Head Using Dual Student-Teacher Framework with CNN-ViT Collaborative Learning [1.5233179662962222]
The segmentation of the pubic symphysis and fetal head (PSFH) constitutes a pivotal step in monitoring labor progression and identifying potential delivery complications.
Traditional semi-supervised learning approaches primarily utilize a unified network model based on Convolutional Neural Networks (CNNs)
We introduce a novel framework, the Dual-Student and Teacher Combining CNN and Transformer (DSTCT)
arXiv Detail & Related papers (2024-09-11T00:57:31Z) - Dcl-Net: Dual Contrastive Learning Network for Semi-Supervised
Multi-Organ Segmentation [12.798684146496754]
We propose a two-stage Dual Contrastive Learning Network for semi-supervised MoS.
In Stage 1, we develop a similarity-guided global contrastive learning to explore the implicit continuity and similarity among images.
In Stage 2, we present an organ-aware local contrastive learning to further attract the class representations.
arXiv Detail & Related papers (2024-03-06T07:39:33Z) - Dual-Augmented Transformer Network for Weakly Supervised Semantic
Segmentation [4.02487511510606]
Weakly supervised semantic segmentation (WSSS) is a fundamental computer vision task, which aims to segment out the object within only class-level labels.
Traditional methods adopt the CNN-based network and utilize the class activation map (CAM) strategy to discover the object regions.
An alternative is to explore vision transformers (ViT) to encode the image to acquire the global semantic information.
We propose a dual network with both CNN-based and transformer networks for mutually complementary learning.
arXiv Detail & Related papers (2023-09-30T08:41:11Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Adversarial Dual-Student with Differentiable Spatial Warping for
Semi-Supervised Semantic Segmentation [70.2166826794421]
We propose a differentiable geometric warping to conduct unsupervised data augmentation.
We also propose a novel adversarial dual-student framework to improve the Mean-Teacher.
Our solution significantly improves the performance and state-of-the-art results are achieved on both datasets.
arXiv Detail & Related papers (2022-03-05T17:36:17Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - A Transductive Multi-Head Model for Cross-Domain Few-Shot Learning [72.30054522048553]
We present a new method, Transductive Multi-Head Few-Shot learning (TMHFS), to address the Cross-Domain Few-Shot Learning challenge.
The proposed methods greatly outperform the strong baseline, fine-tuning, on four different target domains.
arXiv Detail & Related papers (2020-06-08T02:39:59Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.