Tracking by 3D Model Estimation of Unknown Objects in Videos
- URL: http://arxiv.org/abs/2304.06419v1
- Date: Thu, 13 Apr 2023 11:32:36 GMT
- Title: Tracking by 3D Model Estimation of Unknown Objects in Videos
- Authors: Denys Rozumnyi, Jiri Matas, Marc Pollefeys, Vittorio Ferrari, Martin
R. Oswald
- Abstract summary: We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
- Score: 122.56499878291916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most model-free visual object tracking methods formulate the tracking task as
object location estimation given by a 2D segmentation or a bounding box in each
video frame. We argue that this representation is limited and instead propose
to guide and improve 2D tracking with an explicit object representation, namely
the textured 3D shape and 6DoF pose in each video frame. Our representation
tackles a complex long-term dense correspondence problem between all 3D points
on the object for all video frames, including frames where some points are
invisible. To achieve that, the estimation is driven by re-rendering the input
video frames as well as possible through differentiable rendering, which has
not been used for tracking before. The proposed optimization minimizes a novel
loss function to estimate the best 3D shape, texture, and 6DoF pose. We improve
the state-of-the-art in 2D segmentation tracking on three different datasets
with mostly rigid objects.
Related papers
- 3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding.
This paper introduces a novel approach to instance segmentation and tracking in first-person video.
By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - BCOT: A Markerless High-Precision 3D Object Tracking Benchmark [15.8625561193144]
We present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tracking.
Based on our object-centered model, we jointly optimize the object pose by minimizing shape re-projection constraints in all views.
Our new benchmark dataset contains 20 textureless objects, 22 scenes, 404 video sequences and 126K images captured in real scenes.
arXiv Detail & Related papers (2022-03-25T03:55:03Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping [23.456046776979903]
We propose to leverage multiview data of textitstatic points in arbitrary scenes (static or dynamic) to learn a neural 3D mapping module.
The neural 3D mapper consumes RGB-D data as input, and produces a 3D voxel grid of deep features as output.
We show that our unsupervised 3D object trackers outperform prior unsupervised 2D and 2.5D trackers, and approach the accuracy of supervised trackers.
arXiv Detail & Related papers (2020-08-04T02:59:23Z) - Unsupervised object-centric video generation and decomposition in 3D [36.08064849807464]
We propose to model a video as the view seen while moving through a scene with multiple 3D objects and a 3D background.
Our model is trained from monocular videos without any supervision, yet learns to generate coherent 3D scenes containing several moving objects.
arXiv Detail & Related papers (2020-07-07T18:01:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.