BundleTrack: 6D Pose Tracking for Novel Objects without Instance or
Category-Level 3D Models
- URL: http://arxiv.org/abs/2108.00516v1
- Date: Sun, 1 Aug 2021 18:14:46 GMT
- Title: BundleTrack: 6D Pose Tracking for Novel Objects without Instance or
Category-Level 3D Models
- Authors: Bowen Wen and Kostas Bekris
- Abstract summary: This work proposes BundleTrack, a general framework for 6D pose tracking of objects.
An efficient implementation of the framework provides a real-time performance of 10Hz for the entire framework.
- Score: 1.14219428942199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tracking the 6D pose of objects in video sequences is important for robot
manipulation. Most prior efforts, however, often assume that the target
object's CAD model, at least at a category-level, is available for offline
training or during online template matching. This work proposes BundleTrack, a
general framework for 6D pose tracking of novel objects, which does not depend
upon 3D models, either at the instance or category-level. It leverages the
complementary attributes of recent advances in deep learning for segmentation
and robust feature extraction, as well as memory-augmented pose graph
optimization for spatiotemporal consistency. This enables long-term, low-drift
tracking under various challenging scenarios, including significant occlusions
and object motions. Comprehensive experiments given two public benchmarks
demonstrate that the proposed approach significantly outperforms state-of-art,
category-level 6D tracking or dynamic SLAM methods. When compared against
state-of-art methods that rely on an object instance CAD model, comparable
performance is achieved, despite the proposed method's reduced information
requirements. An efficient implementation in CUDA provides a real-time
performance of 10Hz for the entire framework. Code is available at:
https://github.com/wenbowen123/BundleTrack
Related papers
- 3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding.
This paper introduces a novel approach to instance segmentation and tracking in first-person video.
By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Tracking by 3D Model Estimation of Unknown Objects in Videos [122.56499878291916]
We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
arXiv Detail & Related papers (2023-04-13T11:32:36Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Self-Supervised Geometric Correspondence for Category-Level 6D Object
Pose Estimation in the Wild [47.80637472803838]
We introduce a self-supervised learning approach trained directly on large-scale real-world object videos for category-level 6D pose estimation in the wild.
Our framework reconstructs the canonical 3D shape of an object category and learns dense correspondences between input images and the canonical shape via surface embedding.
Surprisingly, our method, without any human annotations or simulators, can achieve on-par or even better performance than previous supervised or semi-supervised methods on in-the-wild images.
arXiv Detail & Related papers (2022-10-13T17:19:22Z) - Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic
Domains [6.187780920448869]
This work presents se(3)-TrackNet, a data-driven optimization approach for long term, 6D pose tracking.
It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model.
Neural network architecture appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra.
arXiv Detail & Related papers (2021-05-29T23:56:05Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Self-Supervised Pretraining of 3D Features on any Point-Cloud [40.26575888582241]
We present a simple self-supervised pertaining method that can work with any 3D data without 3D registration.
We evaluate our models on 9 benchmarks for object detection, semantic segmentation, and object classification, where they achieve state-of-the-art results and can outperform supervised pretraining.
arXiv Detail & Related papers (2021-01-07T18:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.