Dense Optical Tracking: Connecting the Dots
- URL: http://arxiv.org/abs/2312.00786v3
- Date: Mon, 4 Mar 2024 17:24:31 GMT
- Title: Dense Optical Tracking: Connecting the Dots
- Authors: Guillaume Le Moing, Jean Ponce, Cordelia Schmid
- Abstract summary: DOT is a novel, simple and efficient method for solving the problem of point tracking in a video.
We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal trackers" like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker.
- Score: 82.79642869586587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent approaches to point tracking are able to recover the trajectory of any
scene point through a large portion of a video despite the presence of
occlusions. They are, however, too slow in practice to track every point
observed in a single frame in a reasonable amount of time. This paper
introduces DOT, a novel, simple and efficient method for solving this problem.
It first extracts a small set of tracks from key regions at motion boundaries
using an off-the-shelf point tracking algorithm. Given source and target
frames, DOT then computes rough initial estimates of a dense flow field and
visibility mask through nearest-neighbor interpolation, before refining them
using a learnable optical flow estimator that explicitly handles occlusions and
can be trained on synthetic data with ground-truth correspondences. We show
that DOT is significantly more accurate than current optical flow techniques,
outperforms sophisticated "universal" trackers like OmniMotion, and is on par
with, or better than, the best point tracking algorithms like CoTracker while
being at least two orders of magnitude faster. Quantitative and qualitative
experiments with synthetic and real videos validate the promise of the proposed
approach. Code, data, and videos showcasing the capabilities of our approach
are available in the project webpage: https://16lemoing.github.io/dot .
Related papers
- SparseTrack: Multi-Object Tracking by Performing Scene Decomposition
based on Pseudo-Depth [84.64121608109087]
We propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images.
Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets.
By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.
arXiv Detail & Related papers (2023-06-08T14:36:10Z) - TAP-Vid: A Benchmark for Tracking Any Point in a Video [84.94877216665793]
We formalize the problem of tracking arbitrary physical points on surfaces over longer video clips, naming it tracking any point (TAP)
We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks.
We propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
arXiv Detail & Related papers (2022-11-07T17:57:02Z) - Particle Videos Revisited: Tracking Through Occlusions Using Point
Trajectories [29.258861811749103]
We revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem.
We re-build this classic approach using components that drive the current state-of-the-art in flow and object tracking.
We train our models using long-range amodal point trajectories mined from existing optical flow datasets.
arXiv Detail & Related papers (2022-04-08T16:05:48Z) - SDOF-Tracker: Fast and Accurate Multiple Human Tracking by
Skipped-Detection and Optical-Flow [5.041369269600902]
This study aims to improve running speed by performing human detection at a certain frame interval.
We propose a method that complements the detection results with optical flow, based on the fact that someone's appearance does not change much between adjacent frames.
On the MOT20 dataset in the MOTChallenge, the proposed SDOF-Tracker achieved the best performance in terms of the total running speed.
arXiv Detail & Related papers (2021-06-27T15:35:35Z) - Learning Spatio-Temporal Transformer for Visual Tracking [108.11680070733598]
We present a new tracking architecture with an encoder-decoder transformer as the key component.
The whole method is end-to-end, does not need any postprocessing steps such as cosine window and bounding box smoothing.
The proposed tracker achieves state-of-the-art performance on five challenging short-term and long-term benchmarks, while running real-time speed, being 6x faster than Siam R-CNN.
arXiv Detail & Related papers (2021-03-31T15:19:19Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - SFTrack++: A Fast Learnable Spectral Segmentation Approach for
Space-Time Consistent Tracking [6.294759639481189]
We propose an object tracking method, SFTrack++, that learns to preserve the tracked object consistency over space and time dimensions.
We test our method, SFTrack++, on five tracking benchmarks: OTB, UAV, NFS, GOT-10k, and TrackingNet, using five top trackers as input.
arXiv Detail & Related papers (2020-11-27T17:15:20Z) - Tracking-by-Counting: Using Network Flows on Crowd Density Maps for
Tracking Multiple Targets [96.98888948518815]
State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm.
We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
arXiv Detail & Related papers (2020-07-18T19:51:53Z) - RetinaTrack: Online Single Stage Joint Detection and Tracking [22.351109024452462]
We focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical.
We propose a conceptually simple and efficient joint model of detection and tracking, called RetinaTrack, which modifies the popular single stage RetinaNet approach.
arXiv Detail & Related papers (2020-03-30T23:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.