High-level Feature Guided Decoding for Semantic Segmentation
- URL: http://arxiv.org/abs/2303.08646v3
- Date: Mon, 27 Nov 2023 21:58:41 GMT
- Title: High-level Feature Guided Decoding for Semantic Segmentation
- Authors: Ye Huang, Di Kang, Shenghua Gao, Wen Li, Lixin Duan
- Abstract summary: We propose to use powerful pre-trained high-level features as guidance (HFG) for the upsampler to produce robust results.
Specifically, the high-level features from the backbone are used to train the class tokens, which are then reused by the upsampler for classification.
To push the upper limit of HFG, we introduce a context augmentation encoder (CAE) that can efficiently and effectively operate on the low-resolution high-level feature.
- Score: 54.424062794490254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing pyramid-based upsamplers (e.g. SemanticFPN), although efficient,
usually produce less accurate results compared to dilation-based models when
using the same backbone. This is partially caused by the contaminated
high-level features since they are fused and fine-tuned with noisy low-level
features on limited data. To address this issue, we propose to use powerful
pre-trained high-level features as guidance (HFG) so that the upsampler can
produce robust results. Specifically, \emph{only} the high-level features from
the backbone are used to train the class tokens, which are then reused by the
upsampler for classification, guiding the upsampler features to more
discriminative backbone features. One crucial design of the HFG is to protect
the high-level features from being contaminated by using proper stop-gradient
operations so that the backbone does not update according to the noisy gradient
from the upsampler. To push the upper limit of HFG, we introduce a context
augmentation encoder (CAE) that can efficiently and effectively operate on the
low-resolution high-level feature, resulting in improved representation and
thus better guidance. We named our complete solution as the High-Level Features
Guided Decoder (HFGD). We evaluate the proposed HFGD on three benchmarks:
Pascal Context, COCOStuff164k, and Cityscapes. HFGD achieves state-of-the-art
results among methods that do not use extra training data, demonstrating its
effectiveness and generalization ability.
Related papers
- LOBG:Less Overfitting for Better Generalization in Vision-Language Model [19.890629892640206]
We propose a framework named LOBG for vision-language models.
We use CLIP to filter out fine-grained foreground information that might cause overfitting, thereby guiding prompts with basic visual concepts.
Our method significantly improves generalization capability and alleviates overfitting compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-14T08:06:21Z) - Revisiting Cephalometric Landmark Detection from the view of Human Pose
Estimation with Lightweight Super-Resolution Head [11.40242574405714]
We develop a benchmark based on the well-established human pose estimation (HPE) known as MMPose.
We introduce an upscaling design within the framework to further enhance performance.
In the MICCAI CLDetection2023 challenge, our method achieves 1st place ranking on three metrics and 3rd place on the remaining one.
arXiv Detail & Related papers (2023-09-29T11:15:39Z) - Improving Point Cloud Based Place Recognition with Ranking-based Loss
and Large Batch Training [1.116812194101501]
The paper presents a simple and effective learning-based method for computing a discriminative 3D point cloud descriptor.
We employ recent advances in image retrieval and propose a modified version of a loss function based on a differentiable average precision approximation.
arXiv Detail & Related papers (2022-03-02T09:29:28Z) - Cross-layer Navigation Convolutional Neural Network for Fine-grained
Visual Classification [21.223130735592516]
Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class.
For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions.
We propose cross-layer navigation convolutional neural network for feature fusion.
arXiv Detail & Related papers (2021-06-21T08:38:27Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z) - Weakly Supervised Attention Pyramid Convolutional Neural Network for
Fine-Grained Visual Classification [71.96618723152487]
We introduce Attention Pyramid Convolutional Neural Network (AP-CNN)
AP-CNN learns both high-level semantic and low-level detailed feature representation.
It can be trained end-to-end, without the need of additional bounding box/part annotations.
arXiv Detail & Related papers (2020-02-09T12:33:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.