Accidental Turntables: Learning 3D Pose by Watching Objects Turn
- URL: http://arxiv.org/abs/2212.06300v1
- Date: Tue, 13 Dec 2022 00:46:26 GMT
- Title: Accidental Turntables: Learning 3D Pose by Watching Objects Turn
- Authors: Zezhou Cheng, Matheus Gadelha, Subhransu Maji
- Abstract summary: We propose a technique for learning single-view 3D object pose estimation models by utilizing a new source of data -- in-the-wild videos where objects turn.
We show that classical structure-from-motion algorithms, coupled with the recent advances in instance detection and feature matching, provides surprisingly accurate relative 3D pose estimation on such videos.
- Score: 36.71437141704823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a technique for learning single-view 3D object pose estimation
models by utilizing a new source of data -- in-the-wild videos where objects
turn. Such videos are prevalent in practice (e.g., cars in roundabouts,
airplanes near runways) and easy to collect. We show that classical
structure-from-motion algorithms, coupled with the recent advances in instance
detection and feature matching, provides surprisingly accurate relative 3D pose
estimation on such videos. We propose a multi-stage training scheme that first
learns a canonical pose across a collection of videos and then supervises a
model for single-view pose estimation. The proposed technique achieves
competitive performance with respect to existing state-of-the-art on standard
benchmarks for 3D pose estimation, without requiring any pose labels during
training. We also contribute an Accidental Turntables Dataset, containing a
challenging set of 41,212 images of cars in cluttered backgrounds, motion blur
and illumination changes that serves as a benchmark for 3D pose estimation.
Related papers
- Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - Pose Estimation and 3D Reconstruction of Vehicles from Stereo-Images
Using a Subcategory-Aware Shape Prior [0.0]
3D reconstruction of objects or a computer vision is a prerequisite for many applications such as mobile robotics autonomous driving.
The goal of this paper is to show how 3D object reconstruction can profit from prior shape observations.
arXiv Detail & Related papers (2021-07-22T19:47:49Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - 3D Registration for Self-Occluded Objects in Context [66.41922513553367]
We introduce the first deep learning framework capable of effectively handling this scenario.
Our method consists of an instance segmentation module followed by a pose estimation one.
It allows us to perform 3D registration in a one-shot manner, without requiring an expensive iterative procedure.
arXiv Detail & Related papers (2020-11-23T08:05:28Z) - Self-Supervised Multi-View Synchronization Learning for 3D Pose
Estimation [39.334995719523]
Current methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses.
We propose an approach that can exploit small annotated data sets by fine-tuning networks pre-trained via self-supervised learning on (large) unlabeled data sets.
We demonstrate the effectiveness of the synchronization task on the Human3.6M data set and achieve state-of-the-art results in 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T08:01:24Z) - Integration of the 3D Environment for UAV Onboard Visual Object Tracking [7.652259812856325]
Single visual object tracking from an unmanned aerial vehicle poses fundamental challenges.
We introduce a pipeline that combines a model-free visual object tracker, a sparse 3D reconstruction, and a state estimator.
By representing the position of the target in 3D space rather than in image space, we stabilize the tracking during ego-motion.
arXiv Detail & Related papers (2020-08-06T18:37:29Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.