Eliminating Feature Ambiguity for Few-Shot Segmentation
- URL: http://arxiv.org/abs/2407.09842v1
- Date: Sat, 13 Jul 2024 10:33:03 GMT
- Title: Eliminating Feature Ambiguity for Few-Shot Segmentation
- Authors: Qianxiong Xu, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, Rui Zhao,
- Abstract summary: Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features.
This paper presents a novel plug-in termed ambiguity elimination network (AENet), which can be plugged into any existing cross attention-based FSS methods.
- Score: 95.9916573435427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features, typically based on cross attention, which selectively activate query foreground (FG) features that correspond to the same-class support FG features. However, due to the large receptive fields in deep layers of the backbone, the extracted query and support FG features are inevitably mingled with background (BG) features, impeding the FG-FG matching in cross attention. Hence, the query FG features are fused with less support FG features, i.e., the support information is not well utilized. This paper presents a novel plug-in termed ambiguity elimination network (AENet), which can be plugged into any existing cross attention-based FSS methods. The main idea is to mine discriminative query FG regions to rectify the ambiguous FG features, increasing the proportion of FG information, so as to suppress the negative impacts of the doped BG features. In this way, the FG-FG matching is naturally enhanced. We plug AENet into three baselines CyCTR, SCCAN and HDMNet for evaluation, and their scores are improved by large margins, e.g., the 1-shot performance of SCCAN can be improved by 3.0%+ on both PASCAL-5$^i$ and COCO-20$^i$. The code is available at https://github.com/Sam1224/AENet.
Related papers
- Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation [33.204232825380394]
Category-agnostic pose estimation aims to locate keypoints on query images according to a few annotated support images for arbitrary novel classes.
We propose a novel yet concise framework, which recurrently mines FGSA features from both support and query images.
arXiv Detail & Related papers (2025-03-27T04:09:13Z) - Segment to Recognize Robustly -- Enhancing Recognition by Image Decomposition [21.917582794820095]
"Segment to Recognize Robustly" (S2R2) is a novel recognition approach which decouples the FG and BG modelling and combines them in a simple, robust, and interpretable manner.
S2R2 achieves state-of-the-art results on in-domain data while maintaining robustness to BG shifts.
arXiv Detail & Related papers (2024-11-24T17:39:39Z) - Fine-grained Background Representation for Weakly Supervised Semantic Segmentation [35.346567242839065]
This paper proposes a simple fine-grained background representation (FBR) method to discover and represent diverse BG semantics.
We present an active sampling strategy to mine the FG negatives on-the-fly, enabling efficient pixel-to-pixel intra-foreground contrastive learning.
Our method achieves 73.2 mIoU and 45.6 mIoU segmentation results on Pascal Voc and MS COCO test sets, respectively.
arXiv Detail & Related papers (2024-06-22T06:45:25Z) - FG-Net: Facial Action Unit Detection with Generalizable Pyramidal
Features [13.176011491885664]
Previous AU detection methods tend to overfit the dataset, resulting in a significant performance loss when evaluated across corpora.
We propose FG-Net for generalizable facial action unit detection.
Specifically, FG-Net extracts feature maps from a StyleGAN2 model pre-trained on a large and diverse face image dataset.
arXiv Detail & Related papers (2023-08-23T18:51:11Z) - Self-Calibrated Cross Attention Network for Few-Shot Segmentation [65.20559109791756]
We design a self-calibrated cross attention (SCCA) block for efficient patch-based attention.
SCCA groups the patches from the same query image and the aligned patches from the support image as K&V.
In this way, the query BG features are fused with matched BG features in support FG, and thus the aforementioned issues will be mitigated.
arXiv Detail & Related papers (2023-08-18T04:41:50Z) - Pair then Relation: Pair-Net for Panoptic Scene Graph Generation [54.92476119356985]
Panoptic Scene Graph (PSG) aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.
Current PSG methods have limited performance, which hinders downstream tasks or applications.
We present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects.
arXiv Detail & Related papers (2023-07-17T17:58:37Z) - Global Hierarchical Attention for 3D Point Cloud Analysis [88.56041763189162]
We propose a new attention mechanism, called Global Hierarchical Attention (GHA) for 3D point cloud analysis.
For the task of semantic segmentation, GHA gives a +1.7% mIoU increase to the MinkowskiEngine baseline on ScanNet.
For the 3D object detection task, GHA improves the CenterPoint baseline by +0.5% mAP on the nuScenes dataset.
arXiv Detail & Related papers (2022-08-07T19:16:30Z) - Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the
Gap Between Learning in Extensive-Form and Normal-Form Games [76.21916750766277]
We show that the Optimistic Multiplicative Weights Update (OMWU) algorithm can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick.
Specifically, KOMWU gives the first algorithm that guarantees at the same time last-iterate convergence.
arXiv Detail & Related papers (2022-02-01T06:28:51Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.