Related papers: Semantic Correspondence with Transformers

Semantic Correspondence with Transformers

URL: http://arxiv.org/abs/2106.02520v1
Date: Fri, 4 Jun 2021 14:39:03 GMT
Title: Semantic Correspondence with Transformers
Authors: Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn and Seungryong Kim
Abstract summary: We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images. We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
Score: 68.37049687360705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel cost aggregation network, called Cost Aggregation with Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations. Compared to previous hand-crafted or CNN-based methods addressing the cost aggregation stage, which either lack robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields, CATs explore global consensus among initial correlation map with the help of some architectural designs that allow us to exploit full potential of self-attention mechanism. Specifically, we include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation to benefit from hierarchical feature representations within Transformer-based aggregator, and combine with swapping self-attention and residual connections not only to enforce consistent matching, but also to ease the learning process. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies. Code and trained models will be made available at https://github.com/SunghwanHong/CATs.

Related papers

Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
Empowering Vision Transformers with Multi-Scale Causal Intervention for Long-Tailed Image Classification [12.122203089278738]
Causal inference has emerged as a promising approach to mitigate long-tail classification by handling the biases introduced by class imbalance.<n>This paper investigates the influence of existing causal models on CNNs and ViT variants.<n>It proposes TSCNet, a two-stage causal modeling method to discover fine-grained causal associations.
arXiv Detail & Related papers (2025-05-13T02:23:55Z)
ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization [0.32885740436059047]
Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets. We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
arXiv Detail & Related papers (2024-12-02T13:21:31Z)
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z)
On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier [20.17288970927518]
We study the similarity of representations between the hidden layers of individual transformers. We propose an aligned training approach to enhance the similarity between internal representations.
arXiv Detail & Related papers (2024-06-20T16:41:09Z)
Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category. Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations. PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z)
FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images. We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity. In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z)
Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC) SFSC generates a series of compatible sub-models with different capacities through one training process. SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z)
Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank. Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z)
CATs++: Boosting Cost Aggregation with Convolutions and Transformers [31.22435282922934]
We introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map. Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer, we propose CATs++. Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks.
arXiv Detail & Related papers (2022-02-14T15:54:58Z)
Weakly supervised segmentation with cross-modality equivariant constraints [7.757293476741071]
Weakly supervised learning has emerged as an appealing alternative to alleviate the need for large labeled datasets in semantic segmentation. We present a novel learning strategy that leverages self-supervision in a multi-modal image scenario to significantly enhance original CAMs. Our approach outperforms relevant recent literature under the same learning conditions.
arXiv Detail & Related papers (2021-04-06T13:14:20Z)
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation [53.49821324597837]
Weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years. We present a Context Decoupling Augmentation ( CDA) method to change the inherent context in which the objects appear. To validate the effectiveness of the proposed method, extensive experiments on PASCAL VOC 2012 dataset with several alternative network architectures demonstrate that CDA can boost various popular WSSS methods to the new state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-03-02T15:05:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.