AutoLink: Self-supervised Learning of Human Skeletons and Object
Outlines by Linking Keypoints
- URL: http://arxiv.org/abs/2205.10636v6
- Date: Thu, 23 Mar 2023 18:31:48 GMT
- Title: AutoLink: Self-supervised Learning of Human Skeletons and Object
Outlines by Linking Keypoints
- Authors: Xingzhe He, Bastian Wandt, Helge Rhodin
- Abstract summary: We propose a self-supervised method that learns to disentangle object structure from the appearance.
Both the keypoint location and their pairwise edge weights are learned, given only a collection of images depicting the same object class.
The resulting graph is interpretable, for example, AutoLink recovers the human skeleton topology when applied to images showing people.
- Score: 16.5436159805682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured representations such as keypoints are widely used in pose
transfer, conditional image generation, animation, and 3D reconstruction.
However, their supervised learning requires expensive annotation for each
target domain. We propose a self-supervised method that learns to disentangle
object structure from the appearance with a graph of 2D keypoints linked by
straight edges. Both the keypoint location and their pairwise edge weights are
learned, given only a collection of images depicting the same object class. The
resulting graph is interpretable, for example, AutoLink recovers the human
skeleton topology when applied to images showing people. Our key ingredients
are i) an encoder that predicts keypoint locations in an input image, ii) a
shared graph as a latent variable that links the same pairs of keypoints in
every image, iii) an intermediate edge map that combines the latent graph edge
weights and keypoint locations in a soft, differentiable manner, and iv) an
inpainting objective on randomly masked images. Although simpler, AutoLink
outperforms existing self-supervised methods on the established keypoint and
pose estimation benchmarks and paves the way for structure-conditioned
generative models on more diverse datasets. Project website:
https://xingzhehe.github.io/autolink/.
Related papers
- GOReloc: Graph-based Object-Level Relocalization for Visual SLAM [17.608119427712236]
This article introduces a novel method for object-level relocalization of robotic systems.
It determines the pose of a camera sensor by robustly associating the object detections in the current frame with 3D objects in a lightweight object-level map.
arXiv Detail & Related papers (2024-08-15T03:54:33Z) - KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting [0.0]
KGpose is an end-to-end framework for 6D pose estimation of multiple objects.
Our approach combines keypoint-based method with learnable pose regression through keypoint-graph'
arXiv Detail & Related papers (2024-07-12T01:06:00Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with
Pretrained ViT [28.050252998288478]
We propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, for flexible object-centric visual perception.
An off-the-shelf petrained vision transformer (ViT) is deployed for generalizable and transferable feature extraction.
AnyOKP is evaluated on real object images collected with the cameras of a robot arm, a mobile robot, and a surgical robot.
arXiv Detail & Related papers (2023-09-15T04:05:01Z) - Correlational Image Modeling for Self-Supervised Visual Pre-Training [81.82907503764775]
Correlational Image Modeling is a novel and surprisingly effective approach to self-supervised visual pre-training.
Three key designs enable correlational image modeling as a nontrivial and meaningful self-supervisory task.
arXiv Detail & Related papers (2023-03-22T15:48:23Z) - Piecewise Planar Hulls for Semi-Supervised Learning of 3D Shape and Pose
from 2D Images [133.68032636906133]
We study the problem of estimating 3D shape and pose of an object in terms of keypoints, from a single 2D image.
The shape and pose are learned directly from images collected by categories and their partial 2D keypoint annotations.
arXiv Detail & Related papers (2022-11-14T16:18:11Z) - End-to-End Learning of Multi-category 3D Pose and Shape Estimation [128.881857704338]
We propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D.
The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations.
In addition to being end-to-end in image to 3D learning, our method also handles objects from multiple categories using a single neural network.
arXiv Detail & Related papers (2021-12-19T17:10:40Z) - 6D Object Pose Estimation using Keypoints and Part Affinity Fields [24.126513851779936]
The task of 6D object pose estimation from RGB images is an important requirement for autonomous service robots to be able to interact with the real world.
We present a two-step pipeline for estimating the 6 DoF translation and orientation of known objects.
arXiv Detail & Related papers (2021-07-05T14:41:19Z) - Unsupervised Learning of Visual 3D Keypoints for Control [104.92063943162896]
Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.
We propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner.
These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space.
arXiv Detail & Related papers (2021-06-14T17:59:59Z) - Joint Deep Multi-Graph Matching and 3D Geometry Learning from
Inhomogeneous 2D Image Collections [57.60094385551773]
We propose a trainable framework for learning a deformable 3D geometry model from inhomogeneous image collections.
We in addition obtain the underlying 3D geometry of the objects depicted in the 2D images.
arXiv Detail & Related papers (2021-03-31T17:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.