Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence
with Uncertainty Estimation
- URL: http://arxiv.org/abs/2205.11047v1
- Date: Mon, 23 May 2022 05:20:22 GMT
- Title: Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence
with Uncertainty Estimation
- Authors: Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan
Birchfield
- Abstract summary: We propose a category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category.
Our method takes as input the previous and current frame from a monocular video RGB, as well as predictions from the previous frame, to predict the bounding cuboid and pose.
Our framework allows the system take previous uncertainties into consideration when predicting current frame, resulting in predictions that are more accurate stable than single frame methods.
- Score: 29.06824085794294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a single-stage, category-level 6-DoF pose estimation algorithm
that simultaneously detects and tracks instances of objects within a known
category. Our method takes as input the previous and current frame from a
monocular RGB video, as well as predictions from the previous frame, to predict
the bounding cuboid and 6-DoF pose (up to scale). Internally, a deep network
predicts distributions over object keypoints (vertices of the bounding cuboid)
in image coordinates, after which a novel probabilistic filtering process
integrates across estimates before computing the final pose using PnP. Our
framework allows the system to take previous uncertainties into consideration
when predicting the current frame, resulting in predictions that are more
accurate and stable than single frame methods. Extensive experiments show that
our method outperforms existing approaches on the challenging Objectron
benchmark of annotated object videos. We also demonstrate the usability of our
work in an augmented reality setting.
Related papers
- Video prediction using score-based conditional density estimation [9.190468260530634]
We describe an implicit regression-based framework for learning and sampling the conditional density of the next frame in a video.
We show that sequence-to-image deep networks trained on a simple resilience-to-noise objective function extract adaptive representations for temporal prediction.
arXiv Detail & Related papers (2024-10-30T03:16:35Z) - End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation [5.21401636701889]
State-of-the-art 6D object pose estimators directly predict an object pose given an object observation.
We reformulate the state-of-the-art algorithm GDRNPP and introduce EPRO-GDR.
Our solution shows that predicting a pose distribution instead of a single pose can improve state-of-the-art single-view pose estimation.
arXiv Detail & Related papers (2024-09-18T09:11:31Z) - KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction [58.04846444985808]
This paper introduces KRONC, a novel approach aimed at inferring view poses by leveraging prior knowledge about the object to reconstruct and its representation through semantic keypoints.
With a focus on vehicle scenes, KRONC is able to estimate the position of the views as a solution to a light optimization problem targeting the convergence of keypoints' back-projections to a singular point.
arXiv Detail & Related papers (2024-09-09T08:08:05Z) - Rigidity-Aware Detection for 6D Object Pose Estimation [60.88857851869196]
Most recent 6D object pose estimation methods first use object detection to obtain 2D bounding boxes before actually regressing the pose.
We propose a rigidity-aware detection method exploiting the fact that, in 6D pose estimation, the target objects are rigid.
Key to the success of our approach is a visibility map, which we propose to build using a minimum barrier distance between every pixel in the bounding box and the box boundary.
arXiv Detail & Related papers (2023-03-22T09:02:54Z) - STDepthFormer: Predicting Spatio-temporal Depth from Video with a
Self-supervised Transformer Model [0.0]
Self-supervised model simultaneously predicts a sequence of future frames from video-input with a spatial-temporal attention network is proposed.
The proposed model leverages prior scene knowledge such as object shape and texture similar to single-image depth inference methods.
It is implicitly capable of forecasting the motion of objects in the scene, rather than requiring complex models involving multi-object detection, segmentation and tracking.
arXiv Detail & Related papers (2023-03-02T12:22:51Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - CATRE: Iterative Point Clouds Alignment for Category-level Object Pose
Refinement [52.41884119329864]
Category-level object pose and size refiner CATRE is able to iteratively enhance pose estimate from point clouds to produce accurate results.
Our approach remarkably outperforms state-of-the-art methods on REAL275, CAMERA25, and LM benchmarks up to a speed of 85.32Hz.
arXiv Detail & Related papers (2022-07-17T05:55:00Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z) - PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D
Pose Estimation [11.873744190924599]
We introduce a rotational primitive prediction based 6D object pose estimation using a single image as an input.
We leverage a Variational AutoEncoder (VAE) to learn this underlying primitive and its associated keypoints.
When evaluated over public datasets, our method yields a notable improvement over LINEMOD, Occlusion LINEMOD, and the Y-induced dataset.
arXiv Detail & Related papers (2020-06-14T03:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.