Target-aware Bi-Transformer for Few-shot Segmentation
- URL: http://arxiv.org/abs/2309.09492v1
- Date: Mon, 18 Sep 2023 05:28:51 GMT
- Title: Target-aware Bi-Transformer for Few-shot Segmentation
- Authors: Xianglin Wang, Xiaoliu Luo, Taiping Zhang
- Abstract summary: Few-shot semantic segmentation (FSS) aims to use limited labeled support images to identify the segmentation of new classes of objects.
In this paper, we propose the Target-aware Bi-Transformer Network (TBTNet) to equivalent treat of support images and query image.
A vigorous Target-aware Transformer Layer (TTL) also be designed to distill correlations and force the model to focus on foreground information.
- Score: 4.3753381458828695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional semantic segmentation tasks require a large number of labels and
are difficult to identify unlearned categories. Few-shot semantic segmentation
(FSS) aims to use limited labeled support images to identify the segmentation
of new classes of objects, which is very practical in the real world. Previous
researches were primarily based on prototypes or correlations. Due to colors,
textures, and styles are similar in the same image, we argue that the query
image can be regarded as its own support image. In this paper, we proposed the
Target-aware Bi-Transformer Network (TBTNet) to equivalent treat of support
images and query image. A vigorous Target-aware Transformer Layer (TTL) also be
designed to distill correlations and force the model to focus on foreground
information. It treats the hypercorrelation as a feature, resulting a
significant reduction in the number of feature channels. Benefit from this
characteristic, our model is the lightest up to now with only 0.4M learnable
parameters. Futhermore, TBTNet converges in only 10% to 25% of the training
epochs compared to traditional methods. The excellent performance on standard
FSS benchmarks of PASCAL-5i and COCO-20i proves the efficiency of our method.
Extensive ablation studies were also carried out to evaluate the effectiveness
of Bi-Transformer architecture and TTL.
Related papers
- FCC: Fully Connected Correlation for Few-Shot Segmentation [11.277022867553658]
Few-shot segmentation (FSS) aims to segment the target object in a query image using only a small set of support images and masks.
Previous methods have tried to obtain prior information by creating correlation maps from pixel-level correlation on final-layer or same-layer features.
We introduce FCC (Fully Connected Correlation) to integrate pixel-level correlations between support and query features.
arXiv Detail & Related papers (2024-11-18T03:32:02Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Boosting Few-Shot Segmentation via Instance-Aware Data Augmentation and
Local Consensus Guided Cross Attention [7.939095881813804]
Few-shot segmentation aims to train a segmentation model that can fast adapt to a novel task for which only a few annotated images are provided.
We introduce an instance-aware data augmentation (IDA) strategy that augments the support images based on the relative sizes of the target objects.
The proposed IDA effectively increases the support set's diversity and promotes the distribution consistency between support and query images.
arXiv Detail & Related papers (2024-01-18T10:29:10Z) - Fully Attentional Networks with Self-emerging Token Labeling [108.53230681047617]
We train a FAN token labeler (FAN-TL) to generate semantically meaningful patch token labels, followed by a FAN student model training stage that uses both the token labels and the original class label.
With the proposed STL framework, our best model achieves 84.8% Top-1 accuracy and 42.1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46.1%) and ImageNet-R (56.6%) without using extra data.
arXiv Detail & Related papers (2024-01-08T12:14:15Z) - ClusterFormer: Clustering As A Universal Visual Learner [80.79669078819562]
CLUSTERFORMER is a universal vision model based on the CLUSTERing paradigm with TransFORMER.
It is capable of tackling heterogeneous vision tasks with varying levels of clustering granularity.
For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
arXiv Detail & Related papers (2023-09-22T22:12:30Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Enhancing Few-shot Image Classification with Cosine Transformer [4.511561231517167]
Few-shot Cosine Transformer (FS-CT) is a relational map between supports and queries.
Our method performs competitive results in mini-ImageNet, CUB-200, and CIFAR-FS on 1-shot learning and 5-shot learning tasks.
Our FS-CT with cosine attention is a lightweight, simple few-shot algorithm that can be applied for a wide range of applications.
arXiv Detail & Related papers (2022-11-13T06:03:28Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z) - Few-Shot Segmentation via Cycle-Consistent Transformer [74.49307213431952]
We focus on utilizing pixel-wise relationships between support and target images to facilitate the few-shot semantic segmentation task.
We propose using a novel cycle-consistent attention mechanism to filter out possible harmful support features.
Our proposed CyCTR leads to remarkable improvement compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-06-04T07:57:48Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.