Related papers: WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras

WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras

URL: http://arxiv.org/abs/2506.09098v1
Date: Tue, 10 Jun 2025 14:24:50 GMT
Title: WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras
Authors: Yangjie Cui, Boyang Gao, Yiwei Zhang, Xin Dong, Jinwu Xiang, Daochun Li, Zhan Tu,
Abstract summary: We propose the Wavelet Denoising-enhanced DEtection TRansformer, i.e., WD-DETR network, for event cameras.<n>A dense event representation is presented first, which enables real-time reconstruction of events as tensors.<n>We implement our approach on a common onboard computer for robots, the NVIDIA Jetson Orin NX, achieving a high frame rate of approximately 35 FPS.
Score: 15.095401717217934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Previous studies on event camera sensing have demonstrated certain detection performance using dense event representations. However, the accumulated noise in such dense representations has received insufficient attention, which degrades the representation quality and increases the likelihood of missed detections. To address this challenge, we propose the Wavelet Denoising-enhanced DEtection TRansformer, i.e., WD-DETR network, for event cameras. In particular, a dense event representation is presented first, which enables real-time reconstruction of events as tensors. Then, a wavelet transform method is designed to filter noise in the event representations. Such a method is integrated into the backbone for feature extraction. The extracted features are subsequently fed into a transformer-based network for object prediction. To further reduce inference time, we incorporate the Dynamic Reorganization Convolution Block (DRCB) as a fusion module within the hybrid encoder. The proposed method has been evaluated on three event-based object detection datasets, i.e., DSEC, Gen1, and 1Mpx. The results demonstrate that WD-DETR outperforms tested state-of-the-art methods. Additionally, we implement our approach on a common onboard computer for robots, the NVIDIA Jetson Orin NX, achieving a high frame rate of approximately 35 FPS using TensorRT FP16, which is exceptionally well-suited for real-time perception of onboard robotic systems.

Related papers

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance.<n>Radar suffers from noise and positional ambiguity.<n>We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency [4.849111230195686]
Event cameras have the potential to revolutionize the field of robot vision. We propose EV-MGDispNet, a novel event-based stereo disparity estimation method. Our method outperforms currently known state-of-the-art methods in terms of mean absolute error (MAE) and root mean square error (RMSE) metrics.
arXiv Detail & Related papers (2024-08-10T06:13:37Z)
Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods. Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z)
EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields [80.94515892378053]
EvDNeRF is a pipeline for generating event data and training an event-based dynamic NeRF. NeRFs offer geometric-based learnable rendering, but prior work with events has only considered reconstruction of static scenes. We show that by training on varied batch sizes of events, we can improve test-time predictions of events at fine time resolutions.
arXiv Detail & Related papers (2023-10-03T21:08:41Z)
Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection. First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network. Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z)
Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z)
SALISA: Saliency-based Input Sampling for Efficient Video Object Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection. We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z)
Neuromorphic Camera Denoising using Graph Neural Network-driven Transformers [3.805262583092311]
Neuromorphic vision is a bio-inspired technology that has triggered a paradigm shift in the computer-vision community. Neuromorphic cameras suffer from significant amounts of measurement noise. This noise deteriorates the performance of neuromorphic event-based perception and navigation algorithms.
arXiv Detail & Related papers (2021-12-17T18:57:36Z)
Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection [12.915110466077866]
Sound event detection (SED) has gained increasing attention with its wide application in surveillance, video indexing, etc. Existing models in SED mainly generate frame-level predictions, converting it into a sequence multi-label classification problem. This paper firstly presents the 1D Detection Transformer (1D-DETR), inspired by Detection Transformer. Given the characteristics of SED, the audio query and a one-to-many matching strategy are added to 1D-DETR to form the model of Sound Event Detection Transformer (SEDT)
arXiv Detail & Related papers (2021-10-05T12:56:23Z)
Wavelet-Based Network For High Dynamic Range Imaging [64.66969585951207]
Existing methods, such as optical flow based and end-to-end deep learning based solutions, are error-prone either in detail restoration or ghosting artifacts removal. In this work, we propose a novel frequency-guided end-to-end deep neural network (FNet) to conduct HDR fusion in the frequency domain, and Wavelet Transform (DWT) is used to decompose inputs into different frequency bands. The low-frequency signals are used to avoid specific ghosting artifacts, while the high-frequency signals are used for preserving details.
arXiv Detail & Related papers (2021-08-03T12:26:33Z)
EBBINNOT: A Hardware Efficient Hybrid Event-Frame Tracker for Stationary Dynamic Vision Sensors [5.674895233111088]
This paper presents a hybrid event-frame approach for detecting and tracking objects recorded by a stationary neuromorphic sensor. To exploit the background removal property of a static DVS, we propose an event-based binary image creation that signals presence or absence of events in a frame duration. This is the first time a stationary DVS based traffic monitoring solution is extensively compared to simultaneously recorded RGB frame-based methods.
arXiv Detail & Related papers (2020-05-31T03:01:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.