SDTrack: A Baseline for Event-based Tracking via Spiking Neural Networks
- URL: http://arxiv.org/abs/2503.08703v1
- Date: Sun, 09 Mar 2025 02:01:40 GMT
- Title: SDTrack: A Baseline for Event-based Tracking via Spiking Neural Networks
- Authors: Yimeng Shan, Zhenbang Ren, Haodi Wu, Wenjie Wei, Rui-Jie Zhu, Shuai Wang, Dehao Zhang, Yichen Xiao, Jieyuan Zhang, Kexin Shi, Jingzhinan Wang, Jason K. Eshraghian, Haicheng Qu, Jiqing Zhang, Malu Zhang, Yang Yang,
- Abstract summary: Spiking Neural Networks (SNNs) naturally complement event data through discrete spike signals, making them ideal for event-based tracking.<n>We propose the first Transformer-based spike-driven tracking pipeline.<n>Our Global Trajectory Prompt (GTP) method effectively captures global trajectory information and aggregates it with event streams into event images.<n>We then introduce SDTrack, a Transformer-based spike-driven tracker comprising a Spiking MetaFormer backbone and a simple tracking head that directly predicts normalized coordinates using spike signals.
- Score: 14.760720933322325
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Event cameras provide superior temporal resolution, dynamic range, power efficiency, and pixel bandwidth. Spiking Neural Networks (SNNs) naturally complement event data through discrete spike signals, making them ideal for event-based tracking. However, current approaches that combine Artificial Neural Networks (ANNs) and SNNs, along with suboptimal architectures, compromise energy efficiency and limit tracking performance. To address these limitations, we propose the first Transformer-based spike-driven tracking pipeline. Our Global Trajectory Prompt (GTP) method effectively captures global trajectory information and aggregates it with event streams into event images to enhance spatiotemporal representation. We then introduce SDTrack, a Transformer-based spike-driven tracker comprising a Spiking MetaFormer backbone and a simple tracking head that directly predicts normalized coordinates using spike signals. The framework is end-to-end, does not require data augmentation or post-processing. Extensive experiments demonstrate that SDTrack achieves state-of-the-art performance while maintaining the lowest parameter count and energy consumption across multiple event-based tracking benchmarks, establishing a solid baseline for future research in the field of neuromorphic vision.
Related papers
- GazeSCRNN: Event-based Near-eye Gaze Tracking using a Spiking Neural Network [0.0]
This work introduces GazeSCRNN, a novel convolutional recurrent neural network designed for event-based near-eye gaze tracking.
Model processes event streams from DVS cameras using Adaptive Leaky-Integrate-and-Fire (ALIF) neurons and a hybrid architecture for-temporal data.
The most accurate model achieved a Mean Angle Error (MAE) of 6.034degdeg and a Mean Pupil Error (MPE) of 2.094 mm.
arXiv Detail & Related papers (2025-03-20T10:32:15Z) - BlinkTrack: Feature Tracking over 100 FPS via Events and Images [50.98675227695814]
We propose a novel framework, BlinkTrack, which integrates event data with RGB images for high-frequency feature tracking.
Our method extends the traditional Kalman filter into a learning-based framework, utilizing differentiable Kalman filters in both event and image branches.
Experimental results indicate that BlinkTrack significantly outperforms existing event-based methods.
arXiv Detail & Related papers (2024-09-26T15:54:18Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - SpikePoint: An Efficient Point-based Spiking Neural Network for Event
Cameras Action Recognition [11.178792888084692]
Spiking Neural Networks (SNNs) have gained significant attention due to their remarkable efficiency and fault tolerance.
We propose SpikePoint, a novel end-to-end point-based SNN architecture.
SpikePoint excels at processing sparse event cloud data, effectively extracting both global and local features.
arXiv Detail & Related papers (2023-10-11T04:38:21Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - On the Generation of a Synthetic Event-Based Vision Dataset for
Navigation and Landing [69.34740063574921]
This paper presents a methodology for generating event-based vision datasets from optimal landing trajectories.
We construct sequences of photorealistic images of the lunar surface with the Planet and Asteroid Natural Scene Generation Utility.
We demonstrate that the pipeline can generate realistic event-based representations of surface features by constructing a dataset of 500 trajectories.
arXiv Detail & Related papers (2023-08-01T09:14:20Z) - Data-driven Feature Tracking for Event Cameras [48.04815194265117]
We introduce the first data-driven feature tracker for event cameras, which leverages low-latency events to track features detected in a grayscale frame.
By directly transferring zero-shot from synthetic to real data, our data-driven tracker outperforms existing approaches in relative feature age by up to 120%.
This performance gap is further increased to 130% by adapting our tracker to real data with a novel self-supervision strategy.
arXiv Detail & Related papers (2022-11-23T10:20:11Z) - TDIOT: Target-driven Inference for Deep Video Object Tracking [0.2457872341625575]
In this work, we adopt the pre-trained Mask R-CNN deep object detector as the baseline.
We introduce a novel inference architecture placed on top of FPN-ResNet101 backbone of Mask R-CNN to jointly perform detection and tracking.
The proposed single object tracker, TDIOT, applies an appearance similarity-based temporal matching for data association.
arXiv Detail & Related papers (2021-03-19T20:45:06Z) - A Hybrid Neuromorphic Object Tracking and Classification Framework for
Real-time Systems [5.959466944163293]
This paper proposes a real-time, hybrid neuromorphic framework for object tracking and classification using event-based cameras.
Unlike traditional approaches of using event-by-event processing, this work uses a mixed frame and event approach to get energy savings with high performance.
arXiv Detail & Related papers (2020-07-21T07:11:27Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.