Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation
- URL: http://arxiv.org/abs/2303.09919v1
- Date: Fri, 17 Mar 2023 12:12:41 GMT
- Title: Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation
- Authors: Dongsheng Wang, Xu Jia, Yang Zhang, Xinyu Zhang, Yaoyuan Wang, Ziyang
Zhang, Dong Wang, Huchuan Lu
- Abstract summary: Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
- Score: 79.02808071245634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event-based cameras are bio-inspired sensors that capture brightness change
of every pixel in an asynchronous manner. Compared with frame-based sensors,
event cameras have microsecond-level latency and high dynamic range, hence
showing great potential for object detection under high-speed motion and poor
illumination conditions. Due to sparsity and asynchronism nature with event
streams, most of existing approaches resort to hand-crafted methods to convert
event data into 2D grid representation. However, they are sub-optimal in
aggregating information from event stream for object detection. In this work,
we propose to learn an event representation optimized for event-based object
detection. Specifically, event streams are divided into grids in the x-y-t
coordinates for both positive and negative polarity, producing a set of pillars
as 3D tensor representation. To fully exploit information with event streams to
detect objects, a dual-memory aggregation network (DMANet) is proposed to
leverage both long and short memory along event streams to aggregate effective
information for object detection. Long memory is encoded in the hidden state of
adaptive convLSTMs while short memory is modeled by computing spatial-temporal
correlation between event pillars at neighboring time intervals. Extensive
experiments on the recently released event-based automotive detection dataset
demonstrate the effectiveness of the proposed method.
Related papers
- MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking [50.26836546224782]
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy.
The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization.
This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
arXiv Detail & Related papers (2024-04-18T11:09:25Z) - Event-to-Video Conversion for Overhead Object Detection [7.744259147081667]
Event cameras complicate downstream image processing, especially for complex tasks such as object detection.
We show that there is a significant gap in performance between dense event representations and corresponding RGB frames.
We apply event-to-video conversion models that convert event streams into gray-scale video to close this gap.
arXiv Detail & Related papers (2024-02-09T22:07:39Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - SODFormer: Streaming Object Detection with Transformer Using Events and
Frames [31.293847706713052]
DA camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges.
We propose a novel streaming object detector with SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner.
arXiv Detail & Related papers (2023-08-08T04:53:52Z) - Motion Robust High-Speed Light-Weighted Object Detection With Event
Camera [24.192961837270172]
We propose a motion robust and high-speed detection pipeline which better leverages the event data.
Experiments on two typical real-scene event camera object detection datasets show that our method is competitive in terms of accuracy, efficiency, and the number of parameters.
arXiv Detail & Related papers (2022-08-24T15:15:24Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - VisEvent: Reliable Object Tracking via Collaboration of Frame and Event
Flows [93.54888104118822]
We propose a large-scale Visible-Event benchmark (termed VisEvent) due to the lack of a realistic and scaled dataset for this task.
Our dataset consists of 820 video pairs captured under low illumination, high speed, and background clutter scenarios.
Based on VisEvent, we transform the event flows into event images and construct more than 30 baseline methods.
arXiv Detail & Related papers (2021-08-11T03:55:12Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - EBBINNOT: A Hardware Efficient Hybrid Event-Frame Tracker for Stationary
Dynamic Vision Sensors [5.674895233111088]
This paper presents a hybrid event-frame approach for detecting and tracking objects recorded by a stationary neuromorphic sensor.
To exploit the background removal property of a static DVS, we propose an event-based binary image creation that signals presence or absence of events in a frame duration.
This is the first time a stationary DVS based traffic monitoring solution is extensively compared to simultaneously recorded RGB frame-based methods.
arXiv Detail & Related papers (2020-05-31T03:01:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.