End-to-End Learning of Multi-category 3D Pose and Shape Estimation
- URL: http://arxiv.org/abs/2112.10196v1
- Date: Sun, 19 Dec 2021 17:10:40 GMT
- Title: End-to-End Learning of Multi-category 3D Pose and Shape Estimation
- Authors: Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool
- Abstract summary: We propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D.
The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations.
In addition to being end-to-end in image to 3D learning, our method also handles objects from multiple categories using a single neural network.
- Score: 128.881857704338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the representation of the shape and pose of objects
using their keypoints. Therefore, we propose an end-to-end method that
simultaneously detects 2D keypoints from an image and lifts them to 3D. The
proposed method learns both 2D detection and 3D lifting only from 2D keypoints
annotations. In this regard, a novel method that explicitly disentangles the
pose and 3D shape by means of augmentation-based cyclic self-supervision is
proposed, for the first time. In addition of being end-to-end in image to 3D
learning, our method also handles objects from multiple categories using a
single neural network. We use a Transformer-based architecture to detect the
keypoints, as well as to summarize the visual context of the image. This visual
context information is then used while lifting the keypoints to 3D, so as to
allow the context-based reasoning for better performance. While lifting, our
method learns a small set of basis shapes and their sparse non-negative
coefficients to represent the 3D shape in canonical frame. Our method can
handle occlusions as well as wide variety of object classes. Our experiments on
three benchmarks demonstrate that our method performs better than the
state-of-the-art. Our source code will be made publicly available.
Related papers
- Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding [54.981605111365056]
This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding.
Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing.
arXiv Detail & Related papers (2024-06-04T07:42:33Z) - 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - Piecewise Planar Hulls for Semi-Supervised Learning of 3D Shape and Pose
from 2D Images [133.68032636906133]
We study the problem of estimating 3D shape and pose of an object in terms of keypoints, from a single 2D image.
The shape and pose are learned directly from images collected by categories and their partial 2D keypoint annotations.
arXiv Detail & Related papers (2022-11-14T16:18:11Z) - Understanding Pixel-level 2D Image Semantics with 3D Keypoint Knowledge
Engine [56.09471066808409]
We propose a new method on predicting image corresponding semantics in 3D domain and then projecting them back onto 2D images to achieve pixel-level understanding.
We build a large scale keypoint knowledge engine called KeypointNet, which contains 103,450 keypoints and 8,234 3D models from 16 object categories.
arXiv Detail & Related papers (2021-11-21T13:25:20Z) - KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control [64.46042014759671]
KeypointDeformer is an unsupervised method for shape control through automatically discovered 3D keypoints.
Our approach produces intuitive and semantically consistent control of shape deformations.
arXiv Detail & Related papers (2021-04-22T17:59:08Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.