Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation
- URL: http://arxiv.org/abs/2309.12557v1
- Date: Fri, 22 Sep 2023 01:02:21 GMT
- Title: Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation
- Authors: Ping Li and Junjie Chen and Li Yuan and Xianghua Xu and Mingli Song
- Abstract summary: We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
- Score: 54.23510028456082
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: To alleviate the expensive human labeling, semi-supervised semantic
segmentation employs a few labeled images and an abundant of unlabeled images
to predict the pixel-level label map with the same size. Previous methods often
adopt co-training using two convolutional networks with the same architecture
but different initialization, which fails to capture the sufficiently diverse
features. This motivates us to use tri-training and develop the triple-view
encoder to utilize the encoders with different architectures to derive diverse
features, and exploit the knowledge distillation skill to learn the
complementary semantics among these encoders. Moreover, existing methods simply
concatenate the features from both encoder and decoder, resulting in redundant
features that require large memory cost. This inspires us to devise a
dual-frequency decoder that selects those important features by projecting the
features from the spatial domain to the frequency domain, where the
dual-frequency channel attention mechanism is introduced to model the feature
importance. Therefore, we propose a Triple-view Knowledge Distillation
framework, termed TriKD, for semi-supervised semantic segmentation, including
the triple-view encoder and the dual-frequency decoder. Extensive experiments
were conducted on two benchmarks, \ie, Pascal VOC 2012 and Cityscapes, whose
results verify the superiority of the proposed method with a good tradeoff
between precision and inference speed.
Related papers
- A Simple Baseline with Single-encoder for Referring Image Segmentation [14.461024566536478]
We present a novel RIS method with a single-encoder, i.e., BEiT-3, maximizing the potential of shared self-attention.
Our simple baseline with a single encoder achieves outstanding performances on the RIS benchmark datasets.
arXiv Detail & Related papers (2024-08-28T04:14:01Z) - 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders [53.297697898510194]
We propose a joint modeling scheme where four decoders share the same encoder -- we refer to this as 4D modeling.
To efficiently train the 4D model, we introduce a two-stage training strategy that stabilizes multitask learning.
In addition, we propose three novel one-pass beam search algorithms by combining three decoders.
arXiv Detail & Related papers (2024-06-05T05:18:20Z) - Scribble-based 3D Multiple Abdominal Organ Segmentation via
Triple-branch Multi-dilated Network with Pixel- and Class-wise Consistency [20.371144313009122]
We propose a novel 3D framework with two consistency constraints for scribble-supervised multiple abdominal organ segmentation from CT.
For more stable unsupervised learning, we use voxel-wise uncertainty to rectify the soft pseudo labels and then supervise the outputs of each decoder.
Experiments on the public WORD dataset show that our method outperforms five existing scribble-supervised methods.
arXiv Detail & Related papers (2023-09-18T12:50:58Z) - Towards Complex Backgrounds: A Unified Difference-Aware Decoder for
Binary Segmentation [4.6932442139663015]
A new unified dual-branch decoder paradigm named the difference-aware decoder is proposed in this paper.
The difference-aware decoder imitates the human eye in three stages using the multi-level features output by the encoder.
The results demonstrate that the difference-aware decoder can achieve a higher accuracy than the other state-of-the-art binary segmentation methods.
arXiv Detail & Related papers (2022-10-27T03:45:29Z) - LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text
Retrieval [117.15862403330121]
We propose LoopITR, which combines dual encoders and cross encoders in the same network for joint learning.
Specifically, we let the dual encoder provide hard negatives to the cross encoder, and use the more discriminative cross encoder to distill its predictions back to the dual encoder.
arXiv Detail & Related papers (2022-03-10T16:41:12Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing
Vertical and Horizontal Convolutions [58.71117402626524]
We present a novel double-branch encoder architecture for medical image segmentation.
Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels.
The experiments validate the effectiveness of our model on four datasets.
arXiv Detail & Related papers (2021-07-24T02:58:32Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.