Related papers: High-level Feature Guided Decoding for Semantic Segmentation

High-level Feature Guided Decoding for Semantic Segmentation

URL: http://arxiv.org/abs/2303.08646v3
Date: Mon, 27 Nov 2023 21:58:41 GMT
Title: High-level Feature Guided Decoding for Semantic Segmentation
Authors: Ye Huang, Di Kang, Shenghua Gao, Wen Li, Lixin Duan
Abstract summary: We propose to use powerful pre-trained high-level features as guidance (HFG) for the upsampler to produce robust results. Specifically, the high-level features from the backbone are used to train the class tokens, which are then reused by the upsampler for classification. To push the upper limit of HFG, we introduce a context augmentation encoder (CAE) that can efficiently and effectively operate on the low-resolution high-level feature.
Score: 54.424062794490254
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing pyramid-based upsamplers (e.g. SemanticFPN), although efficient, usually produce less accurate results compared to dilation-based models when using the same backbone. This is partially caused by the contaminated high-level features since they are fused and fine-tuned with noisy low-level features on limited data. To address this issue, we propose to use powerful pre-trained high-level features as guidance (HFG) so that the upsampler can produce robust results. Specifically, \emph{only} the high-level features from the backbone are used to train the class tokens, which are then reused by the upsampler for classification, guiding the upsampler features to more discriminative backbone features. One crucial design of the HFG is to protect the high-level features from being contaminated by using proper stop-gradient operations so that the backbone does not update according to the noisy gradient from the upsampler. To push the upper limit of HFG, we introduce a context augmentation encoder (CAE) that can efficiently and effectively operate on the low-resolution high-level feature, resulting in improved representation and thus better guidance. We named our complete solution as the High-Level Features Guided Decoder (HFGD). We evaluate the proposed HFGD on three benchmarks: Pascal Context, COCOStuff164k, and Cityscapes. HFGD achieves state-of-the-art results among methods that do not use extra training data, demonstrating its effectiveness and generalization ability.

Related papers

CPDR: Towards Highly-Efficient Salient Object Detection via Crossed Post-decoder Refinement [3.5321836333805425]
We introduce the Attention Down Sample Fusion (ADF), which employs channel attention mechanisms with attention maps generated by high-level representation to refine the low-level features. We also proposed the Dual Attention Cross Fusion (DACF) upon ADFs and AUFs, which reduces the number of parameters while maintaining the performance. Experiments on five benchmark datasets demonstrate that our method outperforms previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-11T05:41:05Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
LOBG:Less Overfitting for Better Generalization in Vision-Language Model [19.890629892640206]
We propose a framework named LOBG for vision-language models. We use CLIP to filter out fine-grained foreground information that might cause overfitting, thereby guiding prompts with basic visual concepts. Our method significantly improves generalization capability and alleviates overfitting compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-14T08:06:21Z)
Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head [11.40242574405714]
We develop a benchmark based on the well-established human pose estimation (HPE) known as MMPose. We introduce an upscaling design within the framework to further enhance performance. In the MICCAI CLDetection2023 challenge, our method achieves 1st place ranking on three metrics and 3rd place on the remaining one.
arXiv Detail & Related papers (2023-09-29T11:15:39Z)
Improving Point Cloud Based Place Recognition with Ranking-based Loss and Large Batch Training [1.116812194101501]
The paper presents a simple and effective learning-based method for computing a discriminative 3D point cloud descriptor. We employ recent advances in image retrieval and propose a modified version of a loss function based on a differentiable average precision approximation.
arXiv Detail & Related papers (2022-03-02T09:29:28Z)
Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification [21.223130735592516]
Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class. For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions. We propose cross-layer navigation convolutional neural network for feature fusion.
arXiv Detail & Related papers (2021-06-21T08:38:27Z)
Channel DropBlock: An Improved Regularization Method for Fine-Grained Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion. In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results. Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification [71.96618723152487]
We introduce Attention Pyramid Convolutional Neural Network (AP-CNN) AP-CNN learns both high-level semantic and low-level detailed feature representation. It can be trained end-to-end, without the need of additional bounding box/part annotations.
arXiv Detail & Related papers (2020-02-09T12:33:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.