Cost Aggregation Is All You Need for Few-Shot Segmentation
- URL: http://arxiv.org/abs/2112.11685v1
- Date: Wed, 22 Dec 2021 06:18:51 GMT
- Title: Cost Aggregation Is All You Need for Few-Shot Segmentation
- Authors: Sunghwan Hong, Seokju Cho, Jisu Nam, Seungryong Kim
- Abstract summary: We introduce Volumetric Aggregation with Transformers (VAT) to tackle the few-shot segmentation task.
VAT uses both convolutions and transformers to efficiently handle high dimensional correlation maps between query and support.
We find that the proposed method attains state-of-the-art performance even for the standard benchmarks in semantic correspondence task.
- Score: 28.23753949369226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel cost aggregation network, dubbed Volumetric Aggregation
with Transformers (VAT), to tackle the few-shot segmentation task by using both
convolutions and transformers to efficiently handle high dimensional
correlation maps between query and support. In specific, we propose our encoder
consisting of volume embedding module to not only transform the correlation
maps into more tractable size but also inject some convolutional inductive bias
and volumetric transformer module for the cost aggregation. Our encoder has a
pyramidal structure to let the coarser level aggregation to guide the finer
level and enforce to learn complementary matching scores. We then feed the
output into our affinity-aware decoder along with the projected feature maps
for guiding the segmentation process. Combining these components, we conduct
experiments to demonstrate the effectiveness of the proposed method, and our
method sets a new state-of-the-art for all the standard benchmarks in few-shot
segmentation task. Furthermore, we find that the proposed method attains
state-of-the-art performance even for the standard benchmarks in semantic
correspondence task although not specifically designed for this task. We also
provide an extensive ablation study to validate our architectural choices. The
trained weights and codes are available at: https://seokju-cho.github.io/VAT/.
Related papers
- Hierarchical Dense Correlation Distillation for Few-Shot
Segmentation-Extended Abstract [47.85056124410376]
Few-shot semantic segmentation (FSS) aims to form class-agnostic models segmenting unseen classes with only a handful of annotations.
We design Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support correlation based on the transformer architecture.
We propose a matching module to reduce train-set overfitting and introduce correlation distillation leveraging semantic correspondence from coarse resolution to boost fine-grained segmentation.
arXiv Detail & Related papers (2023-06-27T08:10:20Z) - Adaptive Spot-Guided Transformer for Consistent Local Feature Matching [64.30749838423922]
We propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching.
ASTR models the local consistency and scale variations in a unified coarse-to-fine architecture.
arXiv Detail & Related papers (2023-03-29T12:28:01Z) - Hierarchical Dense Correlation Distillation for Few-Shot Segmentation [46.696051965252934]
Few-shot semantic segmentation (FSS) aims to form class-agnostic models segmenting unseen classes with only a handful of annotations.
We design Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support correlation based on the transformer architecture.
We propose a matching module to reduce train-set overfitting and introduce correlation distillation leveraging semantic correspondence from coarse resolution to boost fine-grained segmentation.
arXiv Detail & Related papers (2023-03-26T08:13:12Z) - Integrative Feature and Cost Aggregation with Transformers for Dense
Correspondence [63.868905184847954]
The current state-of-the-art are Transformer-based approaches that focus on either feature descriptors or cost volume aggregation.
We propose a novel Transformer-based network that interleaves both forms of aggregations in a way that exploits their complementary information.
We evaluate the effectiveness of the proposed method on dense matching tasks and achieve state-of-the-art performance on all the major benchmarks.
arXiv Detail & Related papers (2022-09-19T03:33:35Z) - Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot
Segmentation [58.4650849317274]
Volumetric Aggregation with Transformers (VAT) is a cost aggregation network for few-shot segmentation.
VAT attains state-of-the-art performance for semantic correspondence as well, where cost aggregation also plays a central role.
arXiv Detail & Related papers (2022-07-22T04:10:30Z) - Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.