Related papers: Event-Driven Dynamic Scene Depth Completion

Event-Driven Dynamic Scene Depth Completion

URL: http://arxiv.org/abs/2505.13279v2
Date: Tue, 20 May 2025 07:45:25 GMT
Title: Event-Driven Dynamic Scene Depth Completion
Authors: Zhiqiang Yan, Jianhao Jiao, Zhengxue Wang, Gim Hee Lee,
Abstract summary: EventDC is the first event-driven depth completion framework.<n>It consists of two key components: Event-Modulated Alignment (EMA) and Local Depth Filtering (LDF)
Score: 50.01494043834177
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depth completion in dynamic scenes poses significant challenges due to rapid ego-motion and object motion, which can severely degrade the quality of input modalities such as RGB images and LiDAR measurements. Conventional RGB-D sensors often struggle to align precisely and capture reliable depth under such conditions. In contrast, event cameras with their high temporal resolution and sensitivity to motion at the pixel level provide complementary cues that are %particularly beneficial in dynamic environments.To this end, we propose EventDC, the first event-driven depth completion framework. It consists of two key components: Event-Modulated Alignment (EMA) and Local Depth Filtering (LDF). Both modules adaptively learn the two fundamental components of convolution operations: offsets and weights conditioned on motion-sensitive event streams. In the encoder, EMA leverages events to modulate the sampling positions of RGB-D features to achieve pixel redistribution for improved alignment and fusion. In the decoder, LDF refines depth estimations around moving objects by learning motion-aware masks from events. Additionally, EventDC incorporates two loss terms to further benefit global alignment and enhance local depth recovery. Moreover, we establish the first benchmark for event-based depth completion comprising one real-world and two synthetic datasets to facilitate future research. Extensive experiments on this benchmark demonstrate the superiority of our EventDC.

Related papers

UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing [77.10640210751981]
UDPNet is a general framework that leverages depth-based priors from a large-scale pretrained depth estimation model DepthAnything V2.<n>Our proposed solution establishes a new benchmark for depth-aware dehazing across various scenarios.
arXiv Detail & Related papers (2026-01-11T13:29:02Z)
Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking [51.31378940976401]
Existing RGB-Event tracking approaches fail to fully exploit the unique advantages of event cameras.<n>We propose a novel tracking framework that performs early fusion in the frequency domain, enabling effective aggregation of high-frequency information from the event modality.<n>Experiments on three widely used RGB-Event tracking benchmark datasets, including FE108, FELT, and COESOT, demonstrate the high performance and efficiency of our method.
arXiv Detail & Related papers (2026-01-03T01:10:17Z)
Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios [23.41380544271609]
Dynamic range of conventional RGB cameras reduces global contrast and causes loss of high-frequency details.<n>We propose a motion cue fusion network (MCFNet) which achieves optimal cross-modal feature fusion under challenging lighting.<n>MCFNet significantly outperforms existing methods in various poor lighting and fast moving traffic scenarios.
arXiv Detail & Related papers (2025-08-14T14:48:21Z)
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation [41.12790340577986]
Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene. Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection.
arXiv Detail & Related papers (2024-01-16T05:37:08Z)
Learning Parallax for Stereo Event-based Motion Deblurring [8.201943408103995]
Existing approaches rely on the perfect pixel-wise alignment between intensity images and events, which is not always fulfilled in the real world. We propose a novel coarse-to-fine framework, named NETwork of Event-based motion Deblurring with STereo event and intensity cameras (St-EDNet) We build a new dataset with STereo Event and Intensity Cameras (StEIC), containing real-world events, intensity images, and dense disparity maps.
arXiv Detail & Related papers (2023-09-18T06:51:41Z)
DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields [71.94156412354054]
We propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN)<n>DynaMoN handles dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis.<n>We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset.
arXiv Detail & Related papers (2023-09-16T08:46:59Z)
Video Frame Interpolation with Stereo Event and Intensity Camera [40.07341828127157]
We propose a novel Stereo Event-based VFI network (SE-VFI-Net) to generate high-quality intermediate frames. We exploit the fused features accomplishing accurate optical flow and disparity estimation. Our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-07-17T04:02:00Z)
Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z)
Consistent Direct Time-of-Flight Video Depth Super-Resolution [9.173767380836852]
Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing. We propose the first multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the low-resolution dToF imaging. We introduce DyDToF, the first synthetic RGB-dToF video dataset that features dynamic objects and a realistic dToF simulator.
arXiv Detail & Related papers (2022-11-16T04:16:20Z)
Event-based Image Deblurring with Dynamic Motion Awareness [10.81953574179206]
We introduce the first dataset containing pairs of real RGB blur images and related events during the exposure time. Our results show better robustness overall when using events, with improvements in PSNR by up to 1.57dB on synthetic data and 1.08 dB on real event data.
arXiv Detail & Related papers (2022-08-24T09:39:55Z)
DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement. The architecture incorporates LSTM units to propagate information through each refinement step. DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z)
Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD. Two components are designed to implement the effective cross-modality interaction. Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.