EMF: Event Meta Formers for Event-based Real-time Traffic Object Detection
- URL: http://arxiv.org/abs/2504.04124v1
- Date: Sat, 05 Apr 2025 09:48:40 GMT
- Title: EMF: Event Meta Formers for Event-based Real-time Traffic Object Detection
- Authors: Muhammad Ahmed Ullah Khan, Abdul Hannan Khan, Andreas Dengel,
- Abstract summary: Event cameras have higher temporal resolution, and require less storage and bandwidth compared to traditional RGB cameras.<n>Recent approaches in event-based object detection try to bridge this gap by employing computationally expensive transformer-based solutions.<n>Our proposed EMF becomes the fastest Progression-based architecture in the domain by outperforming most efficient event-based object detectors.
- Score: 5.143097874851516
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Event cameras have higher temporal resolution, and require less storage and bandwidth compared to traditional RGB cameras. However, due to relatively lagging performance of event-based approaches, event cameras have not yet replace traditional cameras in performance-critical applications like autonomous driving. Recent approaches in event-based object detection try to bridge this gap by employing computationally expensive transformer-based solutions. However, due to their resource-intensive components, these solutions fail to exploit the sparsity and higher temporal resolution of event cameras efficiently. Moreover, these solutions are adopted from the vision domain, lacking specificity to the event cameras. In this work, we explore efficient and performant alternatives to recurrent vision transformer models and propose a novel event-based object detection backbone. The proposed backbone employs a novel Event Progression Extractor module, tailored specifically for event data, and uses Metaformer concept with convolution-based efficient components. We evaluate the resultant model on well-established traffic object detection benchmarks and conduct cross-dataset evaluation to test its ability to generalize. The proposed model outperforms the state-of-the-art on Prophesee Gen1 dataset by 1.6 mAP while reducing inference time by 14%. Our proposed EMF becomes the fastest DNN-based architecture in the domain by outperforming most efficient event-based object detectors. Moreover, the proposed model shows better ability to generalize to unseen data and scales better with the abundance of data.
Related papers
- SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry [6.552812892993662]
Event cameras asynchronously output low-latency event streams, promising for state estimation in high-speed motion and challenging lighting conditions.<n>We propose SuperEIO, a novel framework that leverages the learning-based event-only detection and IMU measurements to achieve eventinertial odometry.<n>We evaluate our method extensively on multiple public datasets, demonstrating its superior accuracy and robustness compared to other state-of-the-art event-based methods.
arXiv Detail & Related papers (2025-03-29T03:58:15Z) - EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision [0.7270112855088837]
Event-based cameras (EBCs) have emerged as a bio-inspired alternative to traditional cameras.
The development of image analysis methods for EBCs is challenging due to the sparse and asynchronous nature of the data.
We introduce I2EvDet, a novel adaptation framework that bridges mainstream object detection with temporal event data processing.
arXiv Detail & Related papers (2024-12-03T22:49:01Z) - Evaluating Image-Based Face and Eye Tracking with Event Cameras [9.677797822200965]
Event Cameras, also known as Neuromorphic sensors, capture changes in local light intensity at the pixel level, producing asynchronously generated data termed events''
This data format mitigates common issues observed in conventional cameras, like under-sampling when capturing fast-moving objects.
We evaluate the viability of integrating conventional algorithms with event-based data, transformed into a frame format.
arXiv Detail & Related papers (2024-08-19T20:27:08Z) - Event-to-Video Conversion for Overhead Object Detection [7.744259147081667]
Event cameras complicate downstream image processing, especially for complex tasks such as object detection.
We show that there is a significant gap in performance between dense event representations and corresponding RGB frames.
We apply event-to-video conversion models that convert event streams into gray-scale video to close this gap.
arXiv Detail & Related papers (2024-02-09T22:07:39Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - RGB-Event Fusion for Moving Object Detection in Autonomous Driving [3.5397758597664306]
Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving.
Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects.
We propose RENet, a novel RGB-Event fusion Network, that jointly exploits the two complementary modalities to achieve more robust MOD.
arXiv Detail & Related papers (2022-09-17T12:59:08Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Learning to Detect Objects with a 1 Megapixel Event Camera [14.949946376335305]
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range.
Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
arXiv Detail & Related papers (2020-09-28T16:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.