UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description
without Local Correspondence Supervision
- URL: http://arxiv.org/abs/2001.07252v1
- Date: Mon, 20 Jan 2020 21:01:38 GMT
- Title: UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description
without Local Correspondence Supervision
- Authors: Tsun-Yi Yang and Duy-Kien Nguyen and Huub Heijnen and Vassileios
Balntas
- Abstract summary: Three related tasks, namely keypoint detection, description, and image retrieval can be jointly tackled using a single unified framework.
By leveraging diverse information from sequential layers of a standard ResNet-based architecture, we are able to extract keypoints and descriptors that encode local information.
Global information for image retrieval is encoded in an end-to-end pipeline, based on pooling of the aforementioned local responses.
- Score: 16.68130648568593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore how three related tasks, namely keypoint detection,
description, and image retrieval can be jointly tackled using a single unified
framework, which is trained without the need of training data with point to
point correspondences. By leveraging diverse information from sequential layers
of a standard ResNet-based architecture, we are able to extract keypoints and
descriptors that encode local information using generic techniques such as
local activation norms, channel grouping and dropping, and self-distillation.
Subsequently, global information for image retrieval is encoded in an
end-to-end pipeline, based on pooling of the aforementioned local responses. In
contrast to previous methods in local matching, our method does not depend on
pointwise/pixelwise correspondences, and requires no such supervision at all
i.e. no depth-maps from an SfM model nor manually created synthetic affine
transformations. We illustrate that this simple and direct paradigm, is able to
achieve very competitive results against the state-of-the-art methods in
various challenging benchmark conditions such as viewpoint changes, scale
changes, and day-night shifting localization.
Related papers
- Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension [40.21084218601082]
This paper focuses on a challenging setup where target localization is learned directly from image-text pairs.
We propose a novel Progressive Network (PCNet) to leverage target-related textual cues for progressively localizing the target object.
Our method outperforms SOTA methods on three common benchmarks.
arXiv Detail & Related papers (2024-10-02T13:30:32Z) - Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching [0.0]
We propose a new technique, based on graph Laplacian eigenmaps, to match point clouds by taking into account fine local structures.
To deal with the order and sign ambiguity of Laplacian eigenmaps, we introduce a new operator, called Coupled Laplacian.
We show that the similarity between those aligned high-dimensional spaces provides a locally meaningful score to match shapes.
arXiv Detail & Related papers (2024-02-27T10:10:12Z) - LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context
Propagation in Transformers [60.51925353387151]
We propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions.
We use the overlap points of adjacent local regions as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers.
The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks.
arXiv Detail & Related papers (2022-10-23T15:43:01Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [38.38520763114715]
We present Patch2Pix, a novel refinement network that refines match proposals by regressing pixel-level matches from the local regions defined by those proposals.
We show that our refinement network significantly improves the performance of correspondence networks on image matching, homography estimation, and localization tasks.
arXiv Detail & Related papers (2020-12-03T13:44:02Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z) - A Rotation-Invariant Framework for Deep Point Cloud Analysis [132.91915346157018]
We introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs.
Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure.
We evaluate our method on multiple point cloud analysis tasks, including shape classification, part segmentation, and shape retrieval.
arXiv Detail & Related papers (2020-03-16T14:04:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.