Solution for Point Tracking Task of ECCV 2nd Perception Test Challenge 2024
- URL: http://arxiv.org/abs/2410.16286v1
- Date: Sat, 05 Oct 2024 15:09:40 GMT
- Title: Solution for Point Tracking Task of ECCV 2nd Perception Test Challenge 2024
- Authors: Yuxuan Zhang, Pengsong Niu, Kun Yu, Qingguo Chen, Yang Yang,
- Abstract summary: This report introduces an improved method for the Tracking Any Point(TAP), focusing on monitoring physical surfaces in video footage.
We propose a simple yet effective approach called Fine-grained Point Discrimination(textbfFPD), which focuses on perceiving and rectifying point tracking at multiple granularities in zero-shot manner.
- Score: 13.14886222358538
- License:
- Abstract: This report introduces an improved method for the Tracking Any Point~(TAP), focusing on monitoring physical surfaces in video footage. Despite their success with short-sequence scenarios, TAP methods still face performance degradation and resource overhead in long-sequence situations. To address these issues, we propose a simple yet effective approach called Fine-grained Point Discrimination~(\textbf{FPD}), which focuses on perceiving and rectifying point tracking at multiple granularities in zero-shot manner, especially for static points in the videos shot by a static camera. The proposed FPD contains two key components: $(1)$ Multi-granularity point perception, which can detect static sequences in video and points. $(2)$ Dynamic trajectory correction, which replaces point trajectories based on the type of tracked point. Our approach achieved the second highest score in the final test with a score of $0.4720$.
Related papers
- ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model [20.259334882471574]
Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame.
Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios.
We propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on bounding boxes.
arXiv Detail & Related papers (2024-08-28T05:53:30Z) - An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking [2.4549686118633938]
We propose a framework called approximate dynamic programming track (ADPTrack)
It applies dynamic programming principles to improve an existing method called the base.
The proposed method demonstrates a 0.7% improvement in the association accuracy over a state-of-the-art method.
arXiv Detail & Related papers (2024-05-24T01:27:14Z) - Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023 [50.910598799408326]
The Tracking Any Point (TAP) task tracks any physical surface through a video.
Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories.
We propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera.
arXiv Detail & Related papers (2024-03-26T13:50:39Z) - Dense Optical Tracking: Connecting the Dots [82.79642869586587]
DOT is a novel, simple and efficient method for solving the problem of point tracking in a video.
We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal trackers" like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker.
arXiv Detail & Related papers (2023-12-01T18:59:59Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z) - TAP-Vid: A Benchmark for Tracking Any Point in a Video [84.94877216665793]
We formalize the problem of tracking arbitrary physical points on surfaces over longer video clips, naming it tracking any point (TAP)
We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks.
We propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
arXiv Detail & Related papers (2022-11-07T17:57:02Z) - Point-Teaching: Weakly Semi-Supervised Object Detection with Point
Annotations [81.02347863372364]
We present Point-Teaching, a weakly semi-supervised object detection framework.
Specifically, we propose a Hungarian-based point matching method to generate pseudo labels for point annotated images.
We propose a simple-yet-effective data augmentation, termed point-guided copy-paste, to reduce the impact of the unmatched points.
arXiv Detail & Related papers (2022-06-01T07:04:38Z) - Sparse Optical Flow-Based Line Feature Tracking [7.166068174681434]
We propose a novel sparse optical flow (SOF)-based line feature tracking method for the camera pose estimation problem.
This method is inspired by the point-based SOF algorithm and developed based on an observation that two adjacent images satisfy brightness invariant.
Experiments in several public benchmark datasets show our method yields highly competitive accuracy with an obvious advantage over speed.
arXiv Detail & Related papers (2022-04-07T10:00:02Z) - Plug-and-Play Few-shot Object Detection with Meta Strategy and Explicit
Localization Inference [78.41932738265345]
This paper proposes a plug detector that can accurately detect the objects of novel categories without fine-tuning process.
We introduce two explicit inferences into the localization process to reduce its dependence on annotated data.
It shows a significant lead in both efficiency, precision, and recall under varied evaluation protocols.
arXiv Detail & Related papers (2021-10-26T03:09:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.