High-level Feature Guided Decoding for Semantic Segmentation
- URL: http://arxiv.org/abs/2303.08646v3
- Date: Mon, 27 Nov 2023 21:58:41 GMT
- Title: High-level Feature Guided Decoding for Semantic Segmentation
- Authors: Ye Huang, Di Kang, Shenghua Gao, Wen Li, Lixin Duan
- Abstract summary: We propose to use powerful pre-trained high-level features as guidance (HFG) for the upsampler to produce robust results.
Specifically, the high-level features from the backbone are used to train the class tokens, which are then reused by the upsampler for classification.
To push the upper limit of HFG, we introduce a context augmentation encoder (CAE) that can efficiently and effectively operate on the low-resolution high-level feature.
- Score: 54.424062794490254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing pyramid-based upsamplers (e.g. SemanticFPN), although efficient,
usually produce less accurate results compared to dilation-based models when
using the same backbone. This is partially caused by the contaminated
high-level features since they are fused and fine-tuned with noisy low-level
features on limited data. To address this issue, we propose to use powerful
pre-trained high-level features as guidance (HFG) so that the upsampler can
produce robust results. Specifically, \emph{only} the high-level features from
the backbone are used to train the class tokens, which are then reused by the
upsampler for classification, guiding the upsampler features to more
discriminative backbone features. One crucial design of the HFG is to protect
the high-level features from being contaminated by using proper stop-gradient
operations so that the backbone does not update according to the noisy gradient
from the upsampler. To push the upper limit of HFG, we introduce a context
augmentation encoder (CAE) that can efficiently and effectively operate on the
low-resolution high-level feature, resulting in improved representation and
thus better guidance. We named our complete solution as the High-Level Features
Guided Decoder (HFGD). We evaluate the proposed HFGD on three benchmarks:
Pascal Context, COCOStuff164k, and Cityscapes. HFGD achieves state-of-the-art
results among methods that do not use extra training data, demonstrating its
effectiveness and generalization ability.
Related papers
- CPDR: Towards Highly-Efficient Salient Object Detection via Crossed Post-decoder Refinement [3.5321836333805425]
We introduce the Attention Down Sample Fusion (ADF), which employs channel attention mechanisms with attention maps generated by high-level representation to refine the low-level features.
We also proposed the Dual Attention Cross Fusion (DACF) upon ADFs and AUFs, which reduces the number of parameters while maintaining the performance.
Experiments on five benchmark datasets demonstrate that our method outperforms previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-11T05:41:05Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - LOBG:Less Overfitting for Better Generalization in Vision-Language Model [19.890629892640206]
We propose a framework named LOBG for vision-language models.
We use CLIP to filter out fine-grained foreground information that might cause overfitting, thereby guiding prompts with basic visual concepts.
Our method significantly improves generalization capability and alleviates overfitting compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-14T08:06:21Z) - Revisiting Cephalometric Landmark Detection from the view of Human Pose
Estimation with Lightweight Super-Resolution Head [11.40242574405714]
We develop a benchmark based on the well-established human pose estimation (HPE) known as MMPose.
We introduce an upscaling design within the framework to further enhance performance.
In the MICCAI CLDetection2023 challenge, our method achieves 1st place ranking on three metrics and 3rd place on the remaining one.
arXiv Detail & Related papers (2023-09-29T11:15:39Z) - Cross-layer Navigation Convolutional Neural Network for Fine-grained
Visual Classification [21.223130735592516]
Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class.
For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions.
We propose cross-layer navigation convolutional neural network for feature fusion.
arXiv Detail & Related papers (2021-06-21T08:38:27Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z) - Weakly Supervised Attention Pyramid Convolutional Neural Network for
Fine-Grained Visual Classification [71.96618723152487]
We introduce Attention Pyramid Convolutional Neural Network (AP-CNN)
AP-CNN learns both high-level semantic and low-level detailed feature representation.
It can be trained end-to-end, without the need of additional bounding box/part annotations.
arXiv Detail & Related papers (2020-02-09T12:33:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.