Few-shot Geometry-Aware Keypoint Localization
- URL: http://arxiv.org/abs/2303.17216v1
- Date: Thu, 30 Mar 2023 08:19:42 GMT
- Title: Few-shot Geometry-Aware Keypoint Localization
- Authors: Xingzhe He, Gaurav Bharaj, David Ferman, Helge Rhodin, Pablo Garrido
- Abstract summary: We present a novel formulation that learns to localize semantically consistent keypoint definitions.
We use a few user-labeled 2D images as input examples, which are extended via self-supervision.
We introduce 3D geometry-aware constraints to uplift keypoints, achieving more accurate 2D localization.
- Score: 13.51645400661565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised keypoint localization methods rely on large manually labeled image
datasets, where objects can deform, articulate, or occlude. However, creating
such large keypoint labels is time-consuming and costly, and is often
error-prone due to inconsistent labeling. Thus, we desire an approach that can
learn keypoint localization with fewer yet consistently annotated images. To
this end, we present a novel formulation that learns to localize semantically
consistent keypoint definitions, even for occluded regions, for varying object
categories. We use a few user-labeled 2D images as input examples, which are
extended via self-supervision using a larger unlabeled dataset. Unlike
unsupervised methods, the few-shot images act as semantic shape constraints for
object localization. Furthermore, we introduce 3D geometry-aware constraints to
uplift keypoints, achieving more accurate 2D localization. Our general-purpose
formulation paves the way for semantically conditioned generative modeling and
attains competitive or state-of-the-art accuracy on several datasets, including
human faces, eyes, animals, cars, and never-before-seen mouth interior (teeth)
localization tasks, not attempted by the previous few-shot methods. Project
page:
https://xingzhehe.github.io/FewShot3DKP/}{https://xingzhehe.github.io/FewShot3DKP/
Related papers
- SelfGeo: Self-supervised and Geodesic-consistent Estimation of Keypoints on Deformable Shapes [19.730602733938216]
"SelfGeo" is a self-supervised method that computes persistent 3D keypoints of non-rigid objects from arbitrary PCDs without the need of human annotations.
Our main contribution is to enforce that keypoints deform along with the shape while keeping constant geodesic distances among them.
We show experimentally that the use of geodesic has a clear advantage in challenging dynamic scenes.
arXiv Detail & Related papers (2024-08-05T08:00:30Z) - Learning to Produce Semi-dense Correspondences for Visual Localization [11.415451542216559]
This study addresses the challenge of performing visual localization in demanding conditions such as night-time scenarios, adverse weather, and seasonal changes.
We propose a novel method that extracts reliable semi-dense 2D-3D matching points based on dense keypoint matches.
The network utilizes both geometric and visual cues to effectively infer 3D coordinates for unobserved keypoints from the observed ones.
arXiv Detail & Related papers (2024-02-13T10:40:10Z) - Neural Semantic Surface Maps [52.61017226479506]
We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another.
Our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement.
arXiv Detail & Related papers (2023-09-09T16:21:56Z) - Lowis3D: Language-Driven Open-World Instance-Level 3D Scene
Understanding [57.47315482494805]
Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset.
This task is challenging because the model needs to both localize novel 3D objects and infer their semantic categories.
We propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for 3D scenes.
arXiv Detail & Related papers (2023-08-01T07:50:14Z) - Piecewise Planar Hulls for Semi-Supervised Learning of 3D Shape and Pose
from 2D Images [133.68032636906133]
We study the problem of estimating 3D shape and pose of an object in terms of keypoints, from a single 2D image.
The shape and pose are learned directly from images collected by categories and their partial 2D keypoint annotations.
arXiv Detail & Related papers (2022-11-14T16:18:11Z) - AutoLink: Self-supervised Learning of Human Skeletons and Object
Outlines by Linking Keypoints [16.5436159805682]
We propose a self-supervised method that learns to disentangle object structure from the appearance.
Both the keypoint location and their pairwise edge weights are learned, given only a collection of images depicting the same object class.
The resulting graph is interpretable, for example, AutoLink recovers the human skeleton topology when applied to images showing people.
arXiv Detail & Related papers (2022-05-21T16:32:34Z) - P2P-Loc: Point to Point Tiny Person Localization [47.6728595874315]
We propose a novel point-based framework for the person localization task.
Annotating each person as a coarse point (CoarsePoint) can be any point within the object extent, instead of an accurate bounding box.
Our approach achieves comparable object localization performance while saving annotation cost up to 80$%$.
arXiv Detail & Related papers (2021-12-31T08:24:43Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous
Human Annotations [56.34297279246823]
KeypointNet is the first large-scale and diverse 3D keypoint dataset.
It contains 103,450 keypoints and 8,234 3D models from 16 object categories.
Ten state-of-the-art methods are benchmarked on our proposed dataset.
arXiv Detail & Related papers (2020-02-28T12:58:56Z) - Rethinking the Route Towards Weakly Supervised Object Localization [28.90792512056726]
We show that weakly supervised object localization should be divided into two parts: class-agnostic object localization and object classification.
For class-agnostic object localization, we should use class-agnostic methods to generate noisy pseudo annotations and then perform bounding box regression on them without class labels.
Our PSOL models have good transferability across different datasets without fine-tuning.
arXiv Detail & Related papers (2020-02-26T08:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.