Local All-Pair Correspondence for Point Tracking
- URL: http://arxiv.org/abs/2407.15420v1
- Date: Mon, 22 Jul 2024 06:49:56 GMT
- Title: Local All-Pair Correspondence for Point Tracking
- Authors: Seokju Cho, Jiahui Huang, Jisu Nam, Honggyu An, Seungryong Kim, Joon-Young Lee,
- Abstract summary: We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences.
LocoTrack achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than the current state-of-the-art.
- Score: 59.76186266230608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences. Previous approaches in this task often rely on local 2D correlation maps to establish correspondences from a point in the query image to a local region in the target image, which often struggle with homogeneous regions or repetitive features, leading to matching ambiguities. LocoTrack overcomes this challenge with a novel approach that utilizes all-pair correspondences across regions, i.e., local 4D correlation, to establish precise correspondences, with bidirectional correspondence and matching smoothness significantly enhancing robustness against ambiguities. We also incorporate a lightweight correlation encoder to enhance computational efficiency, and a compact Transformer architecture to integrate long-term temporal information. LocoTrack achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than the current state-of-the-art.
Related papers
- Efficient Semantic Matching with Hypercolumn Correlation [58.92933923647451]
HCCNet is an efficient yet effective semantic matching method.
It exploits the full potential of multi-scale correlation maps.
It eschews the reliance on expensive match-wise relationship mining on the 4D correlation map.
arXiv Detail & Related papers (2023-11-07T20:40:07Z) - Robust Fully-Asynchronous Methods for Distributed Training over General Architecture [11.480605289411807]
Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers.
We propose Fully-Asynchronous Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own without any form of impact.
arXiv Detail & Related papers (2023-07-21T14:36:40Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Correspondence Matters for Video Referring Expression Comprehension [64.60046797561455]
Video Referring Expression (REC) aims to localize the referent objects described in the sentence to visual regions in the video frames.
Existing methods suffer from two problems: 1) inconsistent localization results across video frames; 2) confusion between the referent and contextual objects.
We propose a novel Dual Correspondence Network (dubbed as DCNet) which explicitly enhances the dense associations in both the inter-frame and cross-modal manners.
arXiv Detail & Related papers (2022-07-21T10:31:39Z) - Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking [53.668757725179056]
We propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT.
Instead of trying to deal with all appearance changes, we tailor the affinity metric to specialize in ones that might emerge during data associations.
Minimizing the mismatch, the adaptive affinity module brings significant improvements over global re-ID distance.
arXiv Detail & Related papers (2021-12-14T18:59:11Z) - DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor
Points [15.953570826460869]
Establishing dense correspondence between two images is a fundamental computer vision problem.
We introduce DenseGAP, a new solution for efficient Dense correspondence learning with a Graph-structured neural network conditioned on Anchor Points.
Our method advances the state-of-the-art of correspondence learning on most benchmarks.
arXiv Detail & Related papers (2021-12-13T18:59:30Z) - Spatio-Temporal Matching for Siamese Visual Tracking [0.0]
Similarity matching is a core operation in Siamese trackers.
Unlike 2-D image matching, the matching network in object tracking requires 4-D information (height, width, channel and time)
arXiv Detail & Related papers (2021-05-06T02:55:58Z) - Multiple Object Tracking with Correlation Learning [16.959379957515974]
We propose to exploit the local correlation module to model the topological relationship between targets and their surrounding environment.
Specifically, we establish dense correspondences of each spatial location and its context, and explicitly constrain the correlation volumes through self-supervised learning.
Our approach demonstrates the effectiveness of correlation learning with the superior performance and obtains state-of-the-art MOTA of 76.5% and IDF1 of 73.6% on MOT17.
arXiv Detail & Related papers (2021-04-08T06:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.