Related papers: Target-aware Bi-Transformer for Few-shot Segmentation

Target-aware Bi-Transformer for Few-shot Segmentation

URL: http://arxiv.org/abs/2309.09492v1
Date: Mon, 18 Sep 2023 05:28:51 GMT
Title: Target-aware Bi-Transformer for Few-shot Segmentation
Authors: Xianglin Wang, Xiaoliu Luo, Taiping Zhang
Abstract summary: Few-shot semantic segmentation (FSS) aims to use limited labeled support images to identify the segmentation of new classes of objects. In this paper, we propose the Target-aware Bi-Transformer Network (TBTNet) to equivalent treat of support images and query image. A vigorous Target-aware Transformer Layer (TTL) also be designed to distill correlations and force the model to focus on foreground information.
Score: 4.3753381458828695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional semantic segmentation tasks require a large number of labels and are difficult to identify unlearned categories. Few-shot semantic segmentation (FSS) aims to use limited labeled support images to identify the segmentation of new classes of objects, which is very practical in the real world. Previous researches were primarily based on prototypes or correlations. Due to colors, textures, and styles are similar in the same image, we argue that the query image can be regarded as its own support image. In this paper, we proposed the Target-aware Bi-Transformer Network (TBTNet) to equivalent treat of support images and query image. A vigorous Target-aware Transformer Layer (TTL) also be designed to distill correlations and force the model to focus on foreground information. It treats the hypercorrelation as a feature, resulting a significant reduction in the number of feature channels. Benefit from this characteristic, our model is the lightest up to now with only 0.4M learnable parameters. Futhermore, TBTNet converges in only 10% to 25% of the training epochs compared to traditional methods. The excellent performance on standard FSS benchmarks of PASCAL-5i and COCO-20i proves the efficiency of our method. Extensive ablation studies were also carried out to evaluate the effectiveness of Bi-Transformer architecture and TTL.

Related papers

No time to train! Training-Free Reference-Based Instance Segmentation [15.061599989448867]
This work investigates the task of object segmentation when provided with only a small set of reference images.<n>Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image.<n>We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method.
arXiv Detail & Related papers (2025-07-03T16:59:01Z)
FCC: Fully Connected Correlation for Few-Shot Segmentation [11.277022867553658]
Few-shot segmentation (FSS) aims to segment the target object in a query image using only a small set of support images and masks. Previous methods have tried to obtain prior information by creating correlation maps from pixel-level correlation on final-layer or same-layer features. We introduce FCC (Fully Connected Correlation) to integrate pixel-level correlations between support and query features.
arXiv Detail & Related papers (2024-11-18T03:32:02Z)
Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples. For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge. We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z)
Boosting Few-Shot Segmentation via Instance-Aware Data Augmentation and Local Consensus Guided Cross Attention [7.939095881813804]
Few-shot segmentation aims to train a segmentation model that can fast adapt to a novel task for which only a few annotated images are provided. We introduce an instance-aware data augmentation (IDA) strategy that augments the support images based on the relative sizes of the target objects. The proposed IDA effectively increases the support set's diversity and promotes the distribution consistency between support and query images.
arXiv Detail & Related papers (2024-01-18T10:29:10Z)
Fully Attentional Networks with Self-emerging Token Labeling [108.53230681047617]
We train a FAN token labeler (FAN-TL) to generate semantically meaningful patch token labels, followed by a FAN student model training stage that uses both the token labels and the original class label. With the proposed STL framework, our best model achieves 84.8% Top-1 accuracy and 42.1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46.1%) and ImageNet-R (56.6%) without using extra data.
arXiv Detail & Related papers (2024-01-08T12:14:15Z)
ClusterFormer: Clustering As A Universal Visual Learner [80.79669078819562]
CLUSTERFORMER is a universal vision model based on the CLUSTERing paradigm with TransFORMER. It is capable of tackling heterogeneous vision tasks with varying levels of clustering granularity. For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
arXiv Detail & Related papers (2023-09-22T22:12:30Z)
Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts. We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query. Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z)
Enhancing Few-shot Image Classification with Cosine Transformer [4.511561231517167]
Few-shot Cosine Transformer (FS-CT) is a relational map between supports and queries. Our method performs competitive results in mini-ImageNet, CUB-200, and CIFAR-FS on 1-shot learning and 5-shot learning tasks. Our FS-CT with cosine attention is a lightweight, simple few-shot algorithm that can be applied for a wide range of applications.
arXiv Detail & Related papers (2022-11-13T06:03:28Z)
Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS) Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z)
Few-Shot Segmentation via Cycle-Consistent Transformer [74.49307213431952]
We focus on utilizing pixel-wise relationships between support and target images to facilitate the few-shot semantic segmentation task. We propose using a novel cycle-consistent attention mechanism to filter out possible harmful support features. Our proposed CyCTR leads to remarkable improvement compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-06-04T07:57:48Z)
SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples. Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype. This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.