RGB-Event Fusion for Moving Object Detection in Autonomous Driving
- URL: http://arxiv.org/abs/2209.08323v1
- Date: Sat, 17 Sep 2022 12:59:08 GMT
- Title: RGB-Event Fusion for Moving Object Detection in Autonomous Driving
- Authors: Zhuyun Zhou, Zongwei Wu, R\'emi Boutteau, Fan Yang, C\'edric
Demonceaux, Dominique Ginhac
- Abstract summary: Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving.
Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects.
We propose RENet, a novel RGB-Event fusion Network, that jointly exploits the two complementary modalities to achieve more robust MOD.
- Score: 3.5397758597664306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Moving Object Detection (MOD) is a critical vision task for successfully
achieving safe autonomous driving. Despite plausible results of deep learning
methods, most existing approaches are only frame-based and may fail to reach
reasonable performance when dealing with dynamic traffic participants. Recent
advances in sensor technologies, especially the Event camera, can naturally
complement the conventional camera approach to better model moving objects.
However, event-based works often adopt a pre-defined time window for event
representation, and simply integrate it to estimate image intensities from
events, neglecting much of the rich temporal information from the available
asynchronous events. Therefore, from a new perspective, we propose RENet, a
novel RGB-Event fusion Network, that jointly exploits the two complementary
modalities to achieve more robust MOD under challenging scenarios for
autonomous driving. Specifically, we first design a temporal multi-scale
aggregation module to fully leverage event frames from both the RGB exposure
time and larger intervals. Then we introduce a bi-directional fusion module to
attentively calibrate and fuse multi-modal features. To evaluate the
performance of our network, we carefully select and annotate a sub-MOD dataset
from the commonly used DSEC dataset. Extensive experiments demonstrate that our
proposed method performs significantly better than the state-of-the-art
RGB-Event fusion alternatives.
Related papers
- MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking [50.26836546224782]
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy.
The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization.
This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
arXiv Detail & Related papers (2024-04-18T11:09:25Z) - Cross-Modal Object Tracking via Modality-Aware Fusion Network and A
Large-Scale Dataset [20.729414075628814]
We propose an adaptive cross-modal object tracking algorithm called Modality-Aware Fusion Network (MAFNet)
MAFNet efficiently integrates information from both RGB and NIR modalities using an adaptive weighting mechanism.
arXiv Detail & Related papers (2023-12-22T05:22:33Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera [8.673063170884591]
EOLO is a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities.
Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events.
arXiv Detail & Related papers (2023-09-17T15:14:01Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - SODFormer: Streaming Object Detection with Transformer Using Events and
Frames [31.293847706713052]
DA camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges.
We propose a novel streaming object detector with SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner.
arXiv Detail & Related papers (2023-08-08T04:53:52Z) - Event-Free Moving Object Segmentation from Moving Ego Vehicle [88.33470650615162]
Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving.
Most segmentation methods leverage motion cues obtained from optical flow maps.
We propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow.
arXiv Detail & Related papers (2023-04-28T23:43:10Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - A Hybrid Neuromorphic Object Tracking and Classification Framework for
Real-time Systems [5.959466944163293]
This paper proposes a real-time, hybrid neuromorphic framework for object tracking and classification using event-based cameras.
Unlike traditional approaches of using event-by-event processing, this work uses a mixed frame and event approach to get energy savings with high performance.
arXiv Detail & Related papers (2020-07-21T07:11:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.