Optimal and Robust Category-level Perception: Object Pose and Shape
Estimation from 2D and 3D Semantic Keypoints
- URL: http://arxiv.org/abs/2206.12498v3
- Date: Sun, 17 Sep 2023 03:35:13 GMT
- Title: Optimal and Robust Category-level Perception: Object Pose and Shape
Estimation from 2D and 3D Semantic Keypoints
- Authors: Jingnan Shi, Heng Yang, Luca Carlone
- Abstract summary: We consider a problem where one is given 2D or 3D sensor data picturing an object of a given category (e.g., a car) and has to reconstruct the 3D pose and shape of the object.
Our first contribution is to develop PACE3D* and PACE2D*, the first certifiably optimal solvers for pose and shape estimation.
Our second contribution is to developrobust versions of both solvers, named PACE3D# and PACE2D#.
- Score: 24.232254155643574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider a category-level perception problem, where one is given 2D or 3D
sensor data picturing an object of a given category (e.g., a car), and has to
reconstruct the 3D pose and shape of the object despite intra-class variability
(i.e., different car models have different shapes). We consider an active shape
model, where -- for an object category -- we are given a library of potential
CAD models describing objects in that category, and we adopt a standard
formulation where pose and shape are estimated from 2D or 3D keypoints via
non-convex optimization. Our first contribution is to develop PACE3D* and
PACE2D*, the first certifiably optimal solvers for pose and shape estimation
using 3D and 2D keypoints, respectively. Both solvers rely on the design of
tight (i.e., exact) semidefinite relaxations. Our second contribution is to
develop outlier-robust versions of both solvers, named PACE3D# and PACE2D#.
Towards this goal, we propose ROBIN, a general graph-theoretic framework to
prune outliers, which uses compatibility hypergraphs to model measurements'
compatibility. We show that in category-level perception problems these
hypergraphs can be built from the winding orders of the keypoints (in 2D) or
their convex hulls (in 3D), and many outliers can be filtered out via maximum
hyperclique computation. The last contribution is an extensive experimental
evaluation. Besides providing an ablation study on simulated datasets and on
the PASCAL3D+ dataset, we combine our solver with a deep keypoint detector, and
show that PACE3D# improves over the state of the art in vehicle pose estimation
in the ApolloScape datasets, and its runtime is compatible with practical
applications. We release our code at https://github.com/MIT-SPARK/PACE.
Related papers
- Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors [15.34487368683311]
We propose a framework that can reconstruct high-quality object-level maps for unknown objects.
Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses for detected objects.
We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions.
arXiv Detail & Related papers (2023-09-17T00:48:19Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - SNAKE: Shape-aware Neural 3D Keypoint Field [62.91169625183118]
Detecting 3D keypoints from point clouds is important for shape reconstruction.
This work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?
We propose a novel unsupervised paradigm named SNAKE, which is short for shape-aware neural 3D keypoint field.
arXiv Detail & Related papers (2022-06-03T17:58:43Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection [15.244852122106634]
We propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework.
Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain.
For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed.
arXiv Detail & Related papers (2021-08-25T08:50:06Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Optimal Pose and Shape Estimation for Category-level 3D Object
Perception [24.232254155643574]
category-level perception problem, where one is given 3D sensor data picturing an object of a given category.
We provide the first certifiably optimal CAD solver for pose and shape estimation.
We also develop the first graph-theoretic formulation to prune outliers in category-level perception.
arXiv Detail & Related papers (2021-04-16T21:41:29Z) - HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation [7.559220068352681]
We propose a lightweight model called HOPE-Net which jointly estimates hand and object pose in 2D and 3D in real-time.
Our network uses a cascade of two adaptive graph convolutional neural networks, one to estimate 2D coordinates of the hand joints and object corners, followed by another to convert 2D coordinates to 3D.
arXiv Detail & Related papers (2020-03-31T19:01:42Z) - Implicit Functions in Feature Space for 3D Shape Reconstruction and
Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data.
IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.