DELTA: Dense Efficient Long-range 3D Tracking for any video
- URL: http://arxiv.org/abs/2410.24211v2
- Date: Fri, 01 Nov 2024 17:23:01 GMT
- Title: DELTA: Dense Efficient Long-range 3D Tracking for any video
- Authors: Tuan Duc Ngo, Peiye Zhuang, Chuang Gan, Evangelos Kalogerakis, Sergey Tulyakov, Hsin-Ying Lee, Chaoyang Wang,
- Abstract summary: We introduce DELTA, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos.
Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions.
Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.
- Score: 82.26753323263009
- License:
- Abstract: Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. We introduce DELTA, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos. Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions. Unlike existing methods, which are limited by computational inefficiency or sparse tracking, DELTA delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy. Furthermore, we explore the impact of depth representation on tracking performance and identify log-depth as the optimal choice. Extensive experiments demonstrate the superiority of DELTA on multiple benchmarks, achieving new state-of-the-art results in both 2D and 3D dense tracking tasks. Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.
Related papers
- Long-Term 3D Point Tracking By Cost Volume Fusion [2.3411633024711573]
We propose the first deep learning framework for long-term point tracking in 3D that generalizes to new points and videos without requiring test-time fine-tuning.
Our model integrates multiple past appearances and motion information via a transformer architecture, significantly enhancing overall tracking performance.
arXiv Detail & Related papers (2024-07-18T09:34:47Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - 3D Visual Tracking Framework with Deep Learning for Asteroid Exploration [22.808962211830675]
In this paper, we focus on the studied accurate and real-time method for 3D tracking.
A new large-scale 3D asteroid tracking dataset is presented, including binocular video sequences, depth maps, and point clouds of diverse asteroids.
We propose a deep-learning based 3D tracking framework, named as Track3D, which involves 2D monocular tracker and a novel light-weight amodal axis-aligned bounding-box network, A3BoxNet.
arXiv Detail & Related papers (2021-11-21T04:14:45Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion
Forecasting with a Single Convolutional Net [93.51773847125014]
We propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor.
Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world.
arXiv Detail & Related papers (2020-12-22T22:43:35Z) - DeepTracking-Net: 3D Tracking with Unsupervised Learning of Continuous
Flow [12.690471276907445]
This paper deals with the problem of 3D tracking, i.e., to find dense correspondences in a sequence of time-varying 3D shapes.
We propose a novel unsupervised 3D shape framework named DeepTracking-Net, which uses deep neural networks (DNNs) as auxiliary functions.
In addition, we prepare a new synthetic 3D data, named SynMotions, to the 3D tracking and recognition community.
arXiv Detail & Related papers (2020-06-24T16:20:48Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.