PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF
Tracking
- URL: http://arxiv.org/abs/2209.07589v1
- Date: Thu, 15 Sep 2022 19:55:13 GMT
- Title: PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF
Tracking
- Authors: Van Nguyen Nguyen, Yuming Du, Yang Xiao, Michael Ramamonjisoa, Vincent
Lepetit
- Abstract summary: We present a method for tracking the 6D motion of objects in RGB video sequences when neither the training images nor the 3D geometry of the objects are available.
In contrast to previous works, our method can therefore consider unknown objects in open world instantly.
Our results on challenging datasets are on par with previous works that require much more information.
- Score: 27.283648727847268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the relative pose of a new object without prior knowledge is a
hard problem, while it is an ability very much needed in robotics and Augmented
Reality. We present a method for tracking the 6D motion of objects in RGB video
sequences when neither the training images nor the 3D geometry of the objects
are available. In contrast to previous works, our method can therefore consider
unknown objects in open world instantly, without requiring any prior
information or a specific training phase. We consider two architectures, one
based on two frames, and the other relying on a Transformer Encoder, which can
exploit an arbitrary number of past frames. We train our architectures using
only synthetic renderings with domain randomization. Our results on challenging
datasets are on par with previous works that require much more information
(training images of the target objects, 3D models, and/or depth data). Our
source code is available at https://github.com/nv-nguyen/pizza
Related papers
- Inverse Neural Rendering for Explainable Multi-Object Tracking [35.072142773300655]
We recast 3D multi-object tracking from RGB cameras as an emphInverse Rendering (IR) problem.
We optimize an image loss over generative latent spaces that inherently disentangle shape and appearance properties.
We validate the generalization and scaling capabilities of our method by learning the generative prior exclusively from synthetic data.
arXiv Detail & Related papers (2024-04-18T17:37:53Z) - Reconstructing Hand-Held Objects in 3D from Images and Videos [53.277402172488735]
Given a monocular RGB video, we aim to reconstruct hand-held object geometry in 3D, over time.
We present MCC-Hand-Object (MCC-HO), which jointly reconstructs hand and object geometry given a single RGB image.
We then prompt a text-to-3D generative model using GPT-4(V) to retrieve a 3D object model that matches the object in the image.
arXiv Detail & Related papers (2024-04-09T17:55:41Z) - iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis [45.88928345042103]
We present a method for generating consistent novel views from a single source image.
Our approach focuses on maximizing the reuse of visible pixels from the source image.
We use a monocular depth estimator that transfers visible pixels from the source view to the target view.
arXiv Detail & Related papers (2023-10-24T20:33:19Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Visibility Aware Human-Object Interaction Tracking from Single RGB
Camera [40.817960406002506]
We propose a novel method to track the 3D human, object, contacts between them, and their relative translation across frames from a single RGB camera.
We condition our neural field reconstructions for human and object on per-frame SMPL model estimates obtained by pre-fitting SMPL to a video sequence.
Human and object motion from visible frames provides valuable information to infer the occluded object.
arXiv Detail & Related papers (2023-03-29T06:23:44Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Unsupervised Volumetric Animation [54.52012366520807]
We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects.
Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos.
We show our model can obtain animatable 3D objects from a single volume or few images.
arXiv Detail & Related papers (2023-01-26T18:58:54Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - RayTran: 3D pose estimation and shape reconstruction of multiple objects
from videos with ray-traced transformers [41.499325832227626]
We propose a transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos.
We exploit knowledge about the image formation process to significantly sparsify the attention weight matrix.
Compared to previous methods, our architecture is single stage, end-to-end trainable.
arXiv Detail & Related papers (2022-03-24T18:49:12Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.