Event Stream-based Visual Object Tracking: A High-Resolution Benchmark
Dataset and A Novel Baseline
- URL: http://arxiv.org/abs/2309.14611v1
- Date: Tue, 26 Sep 2023 01:42:26 GMT
- Title: Event Stream-based Visual Object Tracking: A High-Resolution Benchmark
Dataset and A Novel Baseline
- Authors: Xiao Wang, Shiao Wang, Chuanming Tang, Lin Zhu, Bo Jiang, Yonghong
Tian, Jin Tang
- Abstract summary: Existing works either utilize aligned RGB and event data for accurate tracking or directly learn an event-based tracker.
We propose a novel hierarchical knowledge distillation framework that can fully utilize multi-modal / multi-view information during training to facilitate knowledge transfer.
We propose the first large-scale high-resolution ($1280 times 720$) dataset named EventVOT. It contains 1141 videos and covers a wide range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc.
- Score: 38.42400442371156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tracking using bio-inspired event cameras has drawn more and more attention
in recent years. Existing works either utilize aligned RGB and event data for
accurate tracking or directly learn an event-based tracker. The first category
needs more cost for inference and the second one may be easily influenced by
noisy events or sparse spatial resolution. In this paper, we propose a novel
hierarchical knowledge distillation framework that can fully utilize
multi-modal / multi-view information during training to facilitate knowledge
transfer, enabling us to achieve high-speed and low-latency visual tracking
during testing by using only event signals. Specifically, a teacher
Transformer-based multi-modal tracking framework is first trained by feeding
the RGB frame and event stream simultaneously. Then, we design a new
hierarchical knowledge distillation strategy which includes pairwise
similarity, feature representation, and response maps-based knowledge
distillation to guide the learning of the student Transformer network.
Moreover, since existing event-based tracking datasets are all low-resolution
($346 \times 260$), we propose the first large-scale high-resolution ($1280
\times 720$) dataset named EventVOT. It contains 1141 videos and covers a wide
range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc.
Extensive experiments on both low-resolution (FE240hz, VisEvent, COESOT), and
our newly proposed high-resolution EventVOT dataset fully validated the
effectiveness of our proposed method. The dataset, evaluation toolkit, and
source code are available on
\url{https://github.com/Event-AHU/EventVOT_Benchmark}
Related papers
- Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms [29.577583619354314]
We propose a large-scale, high-definition ($1280 times 800$) human action recognition dataset based on the CeleX-V event camera.
To build a more comprehensive benchmark dataset, we report over 20 mainstream HAR models for future works to compare.
arXiv Detail & Related papers (2024-08-19T07:52:20Z) - Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline [37.06330707742272]
We first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT.
It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date.
We propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data.
arXiv Detail & Related papers (2024-03-09T08:49:50Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - Revisiting Color-Event based Tracking: A Unified Network, Dataset, and
Metric [53.88188265943762]
We propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.
Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance.
arXiv Detail & Related papers (2022-11-20T16:01:31Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - VisEvent: Reliable Object Tracking via Collaboration of Frame and Event
Flows [93.54888104118822]
We propose a large-scale Visible-Event benchmark (termed VisEvent) due to the lack of a realistic and scaled dataset for this task.
Our dataset consists of 820 video pairs captured under low illumination, high speed, and background clutter scenarios.
Based on VisEvent, we transform the event flows into event images and construct more than 30 baseline methods.
arXiv Detail & Related papers (2021-08-11T03:55:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.