Related papers: FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

URL: http://arxiv.org/abs/2312.08344v2
Date: Tue, 26 Mar 2024 19:25:53 GMT
Title: FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Authors: Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield,
Abstract summary: FoundationPose is a unified foundation model for 6D object pose estimation and tracking. Our approach can be instantly applied at test-time to a novel object without fine-tuning.
Score: 55.77542145604758
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions. Project page: https://nvlabs.github.io/FoundationPose/

Related papers

One2Any: One-Reference 6D Pose Estimation for Any Object [98.50085481362808]
6D object pose estimation remains challenging for many applications due to dependencies on complete 3D models, multi-view images, or training limited to specific object categories.<n>We propose a novel method One2Any that estimates the relative 6-degrees of freedom (DOF) object pose using only a single reference-single query RGB-D image.<n> Experiments on multiple benchmark datasets demonstrate that our model generalizes well to novel objects, achieving state-of-the-art accuracy and even rivaling methods that require multi-view or CAD inputs, at a fraction of compute.
arXiv Detail & Related papers (2025-05-07T03:54:59Z)
Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image [52.11275397911693]
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image. We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts. Our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.
arXiv Detail & Related papers (2025-04-04T05:08:04Z)
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
Unseen object pose estimation methods often rely on CAD models or multiple reference views. To simplify reference acquisition, we aim to estimate the unseen object's pose through a single unposed RGB-D reference image. We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation.
arXiv Detail & Related papers (2024-11-25T05:36:00Z)
FoundPose: Unseen Object Pose Estimation with Foundation Features [11.32559845631345]
FoundPose is a model-based method for 6D pose estimation of unseen objects from a single RGB image. The method can quickly onboard new objects using their 3D models without requiring any object- or task-specific training.
arXiv Detail & Related papers (2023-11-30T18:52:29Z)
MFOS: Model-Free & One-Shot Object Pose Estimation [10.009454818723025]
We introduce a novel approach that can estimate in a single forward pass the pose of objects never seen during training, given minimum input. We conduct extensive experiments and report state-of-the-art one-shot performance on the challenging LINEMOD benchmark.
arXiv Detail & Related papers (2023-10-03T09:12:07Z)
YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation [36.067414358144816]
YOLOPose is a Transformer-based multi-object 6D pose estimation method. We employ a learnable orientation estimation module to predict the orientation from the keypoints. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-07-21T12:53:54Z)
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator. We create a new training pipeline for object to image matching based on a three-view system. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z)
OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models [51.68715543630427]
OnePose relies on detecting repeatable image keypoints and is thus prone to failure on low-textured objects. We propose a keypoint-free pose estimation pipeline to remove the need for repeatable keypoint detection. A 2D-3D matching network directly establishes 2D-3D correspondences between the query image and the reconstructed point-cloud model.
arXiv Detail & Related papers (2023-01-18T17:47:13Z)
MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training. We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z)
Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z)
Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation [30.04752448942084]
Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models. We propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.
arXiv Detail & Related papers (2021-10-30T06:46:44Z)
Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.