Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline
- URL: http://arxiv.org/abs/2403.05839v2
- Date: Wed, 3 Apr 2024 09:25:34 GMT
- Title: Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline
- Authors: Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo,
- Abstract summary: We first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT.
It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date.
We propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data.
- Score: 37.06330707742272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current event-/frame-event based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear. In this paper, we first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT. It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date. We re-train and evaluate 15 baseline trackers on our dataset for future works to compare. More importantly, we find that the RGB frames and event streams are naturally incomplete due to the influence of challenging factors and spatially sparse event flow. In response to this, we propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data. Extensive experiments on RGB-Event (FELT), RGB-Thermal (RGBT234, LasHeR), and RGB-Depth (DepthTrack) datasets fully validated the effectiveness of our model. The dataset and source code can be found at \url{https://github.com/Event-AHU/FELT_SOT_Benchmark}.
Related papers
- BlinkTrack: Feature Tracking over 100 FPS via Events and Images [50.98675227695814]
We propose a novel framework, BlinkTrack, which integrates event data with RGB images for high-frequency feature tracking.
Our method extends the traditional Kalman filter into a learning-based framework, utilizing differentiable Kalman filters in both event and image branches.
Experimental results indicate that BlinkTrack significantly outperforms existing event-based methods.
arXiv Detail & Related papers (2024-09-26T15:54:18Z) - TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking [30.89375068036783]
Existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models.
We propose an Event backbone (Pooler) to obtain a high-quality feature representation that is cognisant of the intrinsic characteristics of the event data.
Our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets.
arXiv Detail & Related papers (2024-05-08T12:19:08Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Event Stream-based Visual Object Tracking: A High-Resolution Benchmark
Dataset and A Novel Baseline [38.42400442371156]
Existing works either utilize aligned RGB and event data for accurate tracking or directly learn an event-based tracker.
We propose a novel hierarchical knowledge distillation framework that can fully utilize multi-modal / multi-view information during training to facilitate knowledge transfer.
We propose the first large-scale high-resolution ($1280 times 720$) dataset named EventVOT. It contains 1141 videos and covers a wide range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc.
arXiv Detail & Related papers (2023-09-26T01:42:26Z) - GLEN: General-Purpose Event Detection for Thousands of Types [80.99866527772512]
We build a general-purpose event detection dataset GLEN, which covers 205K event mentions with 3,465 different types.
GLEN is 20x larger in ontology than today's largest event dataset.
We also propose a new multi-stage event detection model CEDAR specifically designed to handle the large size in GLEN.
arXiv Detail & Related papers (2023-03-16T05:36:38Z) - Revisiting Color-Event based Tracking: A Unified Network, Dataset, and
Metric [53.88188265943762]
We propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.
Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance.
arXiv Detail & Related papers (2022-11-20T16:01:31Z) - RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking [30.448658049744775]
Given a limited amount of annotated RGB-D tracking data, most state-of-the-art RGB-D trackers are simple extensions of high-performance RGB-only trackers.
To address the dataset deficiency issue, a new RGB-D dataset named RGBD1K is released in this paper.
arXiv Detail & Related papers (2022-08-21T03:07:36Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.