Learning to Compose Hypercolumns for Visual Correspondence
- URL: http://arxiv.org/abs/2007.10587v1
- Date: Tue, 21 Jul 2020 04:03:22 GMT
- Title: Learning to Compose Hypercolumns for Visual Correspondence
- Authors: Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho
- Abstract summary: We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
- Score: 57.93635236871264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature representation plays a crucial role in visual correspondence, and
recent methods for image matching resort to deeply stacked convolutional
layers. These models, however, are both monolithic and static in the sense that
they typically use a specific level of features, e.g., the output of the last
layer, and adhere to it regardless of the images to match. In this work, we
introduce a novel approach to visual correspondence that dynamically composes
effective features by leveraging relevant layers conditioned on the images to
match. Inspired by both multi-layer feature composition in object detection and
adaptive inference architectures in classification, the proposed method, dubbed
Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by
selecting a small number of relevant layers from a deep convolutional neural
network. We demonstrate the effectiveness on the task of semantic
correspondence, i.e., establishing correspondences between images depicting
different instances of the same object or scene category. Experiments on
standard benchmarks show that the proposed method greatly improves matching
performance over the state of the art in an adaptive and efficient manner.
Related papers
- Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - Learning to Model Multimodal Semantic Alignment for Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story.
Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities.
We explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model.
arXiv Detail & Related papers (2022-11-14T11:41:44Z) - Cross-View-Prediction: Exploring Contrastive Feature for Hyperspectral
Image Classification [9.131465469247608]
This paper presents a self-supervised feature learning method for hyperspectral image classification.
Our method tries to construct two different views of the raw hyperspectral image through a cross-representation learning method.
And then to learn semantically consistent representation over the created views by contrastive learning method.
arXiv Detail & Related papers (2022-03-14T11:07:33Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task.
We combine the resulting representations with global scene information for accurately predicting visual saliency.
Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.