VisEvent: Reliable Object Tracking via Collaboration of Frame and Event
Flows
- URL: http://arxiv.org/abs/2108.05015v4
- Date: Thu, 21 Sep 2023 06:50:36 GMT
- Title: VisEvent: Reliable Object Tracking via Collaboration of Frame and Event
Flows
- Authors: Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li,
Yaowei Wang, Yonghong Tian, Feng Wu
- Abstract summary: We propose a large-scale Visible-Event benchmark (termed VisEvent) due to the lack of a realistic and scaled dataset for this task.
Our dataset consists of 820 video pairs captured under low illumination, high speed, and background clutter scenarios.
Based on VisEvent, we transform the event flows into event images and construct more than 30 baseline methods.
- Score: 93.54888104118822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Different from visible cameras which record intensity images frame by frame,
the biologically inspired event camera produces a stream of asynchronous and
sparse events with much lower latency. In practice, visible cameras can better
perceive texture details and slow motion, while event cameras can be free from
motion blurs and have a larger dynamic range which enables them to work well
under fast motion and low illumination. Therefore, the two sensors can
cooperate with each other to achieve more reliable object tracking. In this
work, we propose a large-scale Visible-Event benchmark (termed VisEvent) due to
the lack of a realistic and scaled dataset for this task. Our dataset consists
of 820 video pairs captured under low illumination, high speed, and background
clutter scenarios, and it is divided into a training and a testing subset, each
of which contains 500 and 320 videos, respectively. Based on VisEvent, we
transform the event flows into event images and construct more than 30 baseline
methods by extending current single-modality trackers into dual-modality
versions. More importantly, we further build a simple but effective tracking
algorithm by proposing a cross-modality transformer, to achieve more effective
feature fusion between visible and event data. Extensive experiments on the
proposed VisEvent dataset, FE108, COESOT, and two simulated datasets (i.e.,
OTB-DVS and VOT-DVS), validated the effectiveness of our model. The dataset and
source code have been released on:
\url{https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark}.
Related papers
- EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting [76.02450110026747]
Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution.
We propose Event-Aided Free-Trajectory 3DGS, which seamlessly integrates the advantages of event cameras into 3DGS.
We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS.
arXiv Detail & Related papers (2024-10-20T13:44:24Z) - BlinkTrack: Feature Tracking over 100 FPS via Events and Images [50.98675227695814]
We propose a novel framework, BlinkTrack, which integrates event data with RGB images for high-frequency feature tracking.
Our method extends the traditional Kalman filter into a learning-based framework, utilizing differentiable Kalman filters in both event and image branches.
Experimental results indicate that BlinkTrack significantly outperforms existing event-based methods.
arXiv Detail & Related papers (2024-09-26T15:54:18Z) - CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event
Cameras [43.699819213559515]
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and their resolution ($346 times 260$) is low for practical applications.
We build the first unaligned frame-event dataset CRSOT collected with a specially built data acquisition system.
We propose a novel unaligned object tracking framework that can realize robust tracking even using the loosely aligned RGB-Event data.
arXiv Detail & Related papers (2024-01-05T14:20:22Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Event Stream-based Visual Object Tracking: A High-Resolution Benchmark
Dataset and A Novel Baseline [38.42400442371156]
Existing works either utilize aligned RGB and event data for accurate tracking or directly learn an event-based tracker.
We propose a novel hierarchical knowledge distillation framework that can fully utilize multi-modal / multi-view information during training to facilitate knowledge transfer.
We propose the first large-scale high-resolution ($1280 times 720$) dataset named EventVOT. It contains 1141 videos and covers a wide range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc.
arXiv Detail & Related papers (2023-09-26T01:42:26Z) - Learning Optical Flow from Event Camera with Rendered Dataset [45.4342948504988]
We propose to render a physically correct event-flow dataset using computer graphics models.
In particular, we first create indoor and outdoor 3D scenes by Blender with rich scene content variations.
arXiv Detail & Related papers (2023-03-20T10:44:32Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - 3D-FlowNet: Event-based optical flow estimation with 3D representation [2.062593640149623]
Event-based cameras can overpass frame-based cameras limitations for important tasks such as high-speed motion detection.
Deep Neural Networks are not well adapted to work with event data as they are asynchronous and discrete.
We propose 3D-FlowNet, a novel network architecture that can process the 3D input representation and output optical flow estimations.
arXiv Detail & Related papers (2022-01-28T17:28:15Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.