Match me if you can: Semantic Correspondence Learning with Unpaired
Images
- URL: http://arxiv.org/abs/2311.18540v1
- Date: Thu, 30 Nov 2023 13:22:15 GMT
- Title: Match me if you can: Semantic Correspondence Learning with Unpaired
Images
- Authors: Jiwon Kim, Byeongho Heo, Sangdoo Yun, Seungryong Kim, Dongyoon Han
- Abstract summary: We propose a simple yet effective method that performs training with unlabeled pairs to complement both limited image pairs and sparse point pairs.
Using a simple teacher-student framework, we offer reliable pseudo correspondences to the student network via machine supervision.
Our models outperform the milestone baselines, including state-of-the-art methods on semantic correspondence benchmarks.
- Score: 82.05105090432025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent approaches for semantic correspondence have focused on obtaining
high-quality correspondences using a complicated network, refining the
ambiguous or noisy matching points. Despite their performance improvements,
they remain constrained by the limited training pairs due to costly point-level
annotations. This paper proposes a simple yet effective method that performs
training with unlabeled pairs to complement both limited image pairs and sparse
point pairs, requiring neither extra labeled keypoints nor trainable modules.
We fundamentally extend the data quantity and variety by augmenting new
unannotated pairs not primitively provided as training pairs in benchmarks.
Using a simple teacher-student framework, we offer reliable pseudo
correspondences to the student network via machine supervision. Finally, the
performance of our network is steadily improved by the proposed iterative
training, putting back the student as a teacher to generate refined labels and
train a new student repeatedly. Our models outperform the milestone baselines,
including state-of-the-art methods on semantic correspondence benchmarks.
Related papers
- Pre-Trained Vision-Language Models as Partial Annotators [40.89255396643592]
Pre-trained vision-language models learn massive data to model unified representations of images and natural languages.
In this paper, we investigate a novel "pre-trained annotating - weakly-supervised learning" paradigm for pre-trained model application and experiment on image classification tasks.
arXiv Detail & Related papers (2024-05-23T17:17:27Z) - Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification [2.1223532600703385]
This paper presents an innovative disjoint sampling approach for training SOTA models on Hyperspectral image classification (HSIC) tasks.
By separating training, validation, and test data without overlap, the proposed method facilitates a fairer evaluation of how well a model can classify pixels it was not exposed to during training or validation.
This rigorous methodology is critical for advancing SOTA models and their real-world application to large-scale land mapping with Hyperspectral sensors.
arXiv Detail & Related papers (2024-04-23T11:40:52Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - iMatching: Imperative Correspondence Learning [5.568520539073218]
We introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence.
It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels.
We demonstrate superior performance on tasks including feature matching and pose estimation.
arXiv Detail & Related papers (2023-12-04T18:58:20Z) - With a Little Help from your own Past: Prototypical Memory Networks for
Image Captioning [47.96387857237473]
We devise a network which can perform attention over activations obtained while processing other training samples.
Our memory models the distribution of past keys and values through the definition of prototype vectors.
We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training.
arXiv Detail & Related papers (2023-08-23T18:53:00Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Semi-Supervised Learning of Semantic Correspondence with Pseudo-Labels [26.542718087103665]
SemiMatch is a semi-supervised solution for establishing dense correspondences across semantically similar images.
Our framework generates the pseudo-labels using the model's prediction itself between source and weakly-augmented target, and uses pseudo-labels to learn the model again between source and strongly-augmented target.
In experiments, SemiMatch achieves state-of-the-art performance on various benchmarks, especially on PF-Willow by a large margin.
arXiv Detail & Related papers (2022-03-30T03:52:50Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.