Joint Learning of Feature Extraction and Cost Aggregation for Semantic
Correspondence
- URL: http://arxiv.org/abs/2204.02164v1
- Date: Tue, 5 Apr 2022 12:45:49 GMT
- Title: Joint Learning of Feature Extraction and Cost Aggregation for Semantic
Correspondence
- Authors: Jiwon Kim, Youngjo Min, Mira Kim, and Seungryong Kim
- Abstract summary: We propose a novel framework for jointly learning feature extraction and cost aggregation for semantic correspondence.
We present a confidence-aware contrastive loss function for learning the networks in a weakly-supervised manner.
- Score: 30.488870941738636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Establishing dense correspondences across semantically similar images is one
of the challenging tasks due to the significant intra-class variations and
background clutters. To solve these problems, numerous methods have been
proposed, focused on learning feature extractor or cost aggregation
independently, which yields sub-optimal performance. In this paper, we propose
a novel framework for jointly learning feature extraction and cost aggregation
for semantic correspondence. By exploiting the pseudo labels from each module,
the networks consisting of feature extraction and cost aggregation modules are
simultaneously learned in a boosting fashion. Moreover, to ignore unreliable
pseudo labels, we present a confidence-aware contrastive loss function for
learning the networks in a weakly-supervised manner. We demonstrate our
competitive results on standard benchmarks for semantic correspondence.
Related papers
- Feature-based Federated Transfer Learning: Communication Efficiency, Robustness and Privacy [11.308544280789016]
We propose feature-based federated transfer learning as a novel approach to improve communication efficiency.
Specifically, in the proposed feature-based federated learning, we design the extracted features and outputs to be uploaded instead of parameter updates.
We evaluate the performance of the proposed learning scheme via experiments on an image classification task and a natural language processing task to demonstrate its effectiveness.
arXiv Detail & Related papers (2024-05-15T00:43:19Z) - Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks.
We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes.
Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Semantic Contrastive Bootstrapping for Single-positive Multi-label
Recognition [36.3636416735057]
We present a semantic contrastive bootstrapping (Scob) approach to gradually recover the cross-object relationships.
We then propose a recurrent semantic masked transformer to extract iconic object-level representations.
Extensive experimental results demonstrate that the proposed joint learning framework surpasses the state-of-the-art models.
arXiv Detail & Related papers (2023-07-15T01:59:53Z) - FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced
Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images.
We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity.
In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z) - Semi-Supervised Learning of Semantic Correspondence with Pseudo-Labels [26.542718087103665]
SemiMatch is a semi-supervised solution for establishing dense correspondences across semantically similar images.
Our framework generates the pseudo-labels using the model's prediction itself between source and weakly-augmented target, and uses pseudo-labels to learn the model again between source and strongly-augmented target.
In experiments, SemiMatch achieves state-of-the-art performance on various benchmarks, especially on PF-Willow by a large margin.
arXiv Detail & Related papers (2022-03-30T03:52:50Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.