UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2410.10777v1
- Date: Mon, 14 Oct 2024 17:49:27 GMT
- Title: UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation
- Authors: Lihe Yang, Zhen Zhao, Hengshuang Zhao,
- Abstract summary: Semi-supervised semantic segmentation (SSS) aims at learning rich visual knowledge from cheap unlabeled images.
We present our upgraded and simplified UniMatch V2, inheriting the core spirit of weak-to-strong consistency from V1.
- Score: 26.91063423376469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised semantic segmentation (SSS) aims at learning rich visual knowledge from cheap unlabeled images to enhance semantic segmentation capability. Among recent works, UniMatch improves its precedents tremendously by amplifying the practice of weak-to-strong consistency regularization. Subsequent works typically follow similar pipelines and propose various delicate designs. Despite the achieved progress, strangely, even in this flourishing era of numerous powerful vision models, almost all SSS works are still sticking to 1) using outdated ResNet encoders with small-scale ImageNet-1K pre-training, and 2) evaluation on simple Pascal and Cityscapes datasets. In this work, we argue that, it is necessary to switch the baseline of SSS from ResNet-based encoders to more capable ViT-based encoders (e.g., DINOv2) that are pre-trained on massive data. A simple update on the encoder (even using 2x fewer parameters) can bring more significant improvement than careful method designs. Built on this competitive baseline, we present our upgraded and simplified UniMatch V2, inheriting the core spirit of weak-to-strong consistency from V1, but requiring less training cost and providing consistently better results. Additionally, witnessing the gradually saturated performance on Pascal and Cityscapes, we appeal that we should focus on more challenging benchmarks with complex taxonomy, such as ADE20K and COCO datasets. Code, models, and logs of all reported values, are available at https://github.com/LiheYoung/UniMatch-V2.
Related papers
- SegViTv2: Exploring Efficient and Continual Semantic Segmentation with
Plain Vision Transformers [76.13755422671822]
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework.
We introduce a novel Attention-to-Mask (atm) module to design a lightweight decoder effective for plain ViT.
Our decoder outperforms the popular decoder UPerNet using various ViT backbones while consuming only about $5%$ of the computational cost.
arXiv Detail & Related papers (2023-06-09T22:29:56Z) - TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision.
We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder.
We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z) - SegNeXt: Rethinking Convolutional Attention Design for Semantic
Segmentation [100.89770978711464]
We present SegNeXt, a simple convolutional network architecture for semantic segmentation.
We show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers.
arXiv Detail & Related papers (2022-09-18T14:33:49Z) - Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic
Segmentation [27.831267434546024]
We revisit the weak-to-strong consistency framework popularized by FixMatch from semi-supervised classification.
We propose an auxiliary feature perturbation stream as a supplement, leading to an expanded perturbation space.
Our overall Unified Dual-Stream Perturbations approach (UniMatch) surpasses all existing methods significantly across all evaluation protocols.
arXiv Detail & Related papers (2022-08-21T15:32:43Z) - PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers [102.7922200135147]
This paper explores a better codebook for BERT pre-training of vision transformers.
By contrast, the discrete tokens in NLP field are naturally highly semantic.
We demonstrate that the visual tokens generated by the proposed perceptual codebook do exhibit better semantic meanings.
arXiv Detail & Related papers (2021-11-24T18:59:58Z) - Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive
Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard.
We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z) - So-ViT: Mind Visual Tokens for Vision Transformer [27.243241133304785]
We propose a new classification paradigm, where the second-order, cross-covariance pooling of visual tokens is combined with class token for final classification.
We develop a light-weight, hierarchical module based on off-the-shelf convolutions for visual token embedding.
The results show our models, when trained from scratch, outperform the competing ViT variants, while being on par with or better than state-of-the-art CNN models.
arXiv Detail & Related papers (2021-04-22T09:05:09Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.