Related papers: Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation

Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation

URL: http://arxiv.org/abs/2505.01548v1
Date: Fri, 02 May 2025 19:19:58 GMT
Title: Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation
Authors: Zhen Yao, Xiaowen Ying, Mooi Choo Chuah,
Abstract summary: Event cameras capture motion dynamics, offering a unique modality with great potential in various computer vision tasks.<n> RGB-Event fusion faces three misalignments: (i) temporal, (ii) temporal, and (iii) modal misalignment.<n>We propose Motion-enhanced Event (MET), which transforms sparse event voxels into a dense and temporally coherent form.
Score: 8.76832497215149
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Event cameras capture motion dynamics, offering a unique modality with great potential in various computer vision tasks. However, RGB-Event fusion faces three intrinsic misalignments: (i) temporal, (ii) spatial, and (iii) modal misalignment. Existing voxel grid representations neglect temporal correlations between consecutive event windows, and their formulation with simple accumulation of asynchronous and sparse events is incompatible with the synchronous and dense nature of RGB modality. To tackle these challenges, we propose a novel event representation, Motion-enhanced Event Tensor (MET), which transforms sparse event voxels into a dense and temporally coherent form by leveraging dense optical flows and event temporal features. In addition, we introduce a Frequency-aware Bidirectional Flow Aggregation Module (BFAM) and a Temporal Fusion Module (TFM). BFAM leverages the frequency domain and MET to mitigate modal misalignment, while bidirectional flow aggregation and temporal fusion mechanisms resolve spatiotemporal misalignment. Experimental results on two large-scale datasets demonstrate that our framework significantly outperforms state-of-the-art RGB-Event semantic segmentation approaches. Our code is available at: https://github.com/zyaocoder/BRENet.

Related papers

Spatially-guided Temporal Aggregation for Robust Event-RGB Optical Flow Estimation [47.75348821902489]
Current optical flow methods exploit the stable appearance of frame (or RGB) data to establish robust correspondences across time.<n>Event cameras, on the other hand, provide high-temporal-resolution motion cues and excel in challenging scenarios.<n>This study introduces a novel approach that uses a spatially dense modality to guide the aggregation of the temporally dense event modality.
arXiv Detail & Related papers (2025-01-01T13:40:09Z)
Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking [32.86991031493605]
Event-based bionic camera captures dynamic scenes with high temporal resolution and high dynamic range. We propose a dynamic event subframe splitting strategy to split the event stream into more fine-grained event clusters. Based on this, we design an event-based sparse attention mechanism to enhance the interaction of event features in temporal and spatial dimensions.
arXiv Detail & Related papers (2024-09-26T06:12:08Z)
MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking [50.26836546224782]
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy. The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
arXiv Detail & Related papers (2024-04-18T11:09:25Z)
Revisiting Event-based Video Frame Interpolation [49.27404719898305]
Dynamic vision sensors or event cameras provide rich complementary information for video frame. estimating optical flow from events is arguably more difficult than from RGB information. We propose a divide-and-conquer strategy in which event-based intermediate frame synthesis happens incrementally in multiple simplified stages.
arXiv Detail & Related papers (2023-07-24T06:51:07Z)
Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution [9.431635577890745]
Event cameras sense the intensity changes asynchronously and produce event streams with high dynamic range and low latency. This has inspired research endeavors utilizing events to guide the challenging video superresolution (VSR) task. We make the first attempt to address a novel problem of achieving VSR at random scales by taking advantages of the high temporal resolution property of events.
arXiv Detail & Related papers (2023-03-24T02:42:16Z)
Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z)
RGB-Event Fusion for Moving Object Detection in Autonomous Driving [3.5397758597664306]
Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving. Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects. We propose RENet, a novel RGB-Event fusion Network, that jointly exploits the two complementary modalities to achieve more robust MOD.
arXiv Detail & Related papers (2022-09-17T12:59:08Z)
Event Transformer [43.193463048148374]
Event camera's low power consumption and ability to capture microsecond brightness make it attractive for various computer vision tasks. Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs) This work introduces a novel token-based event representation, where each event is considered a fundamental processing unit termed an event-token.
arXiv Detail & Related papers (2022-04-11T15:05:06Z)
ProgressiveMotionSeg: Mutually Reinforced Framework for Event-Based Motion Segmentation [101.19290845597918]
This paper presents a Motion Estimation (ME) module and an Event Denoising (ED) module jointly optimized in a mutually reinforced manner. Taking temporal correlation as guidance, ED module calculates the confidence that each event belongs to real activity events, and transmits it to ME module to update energy function of motion segmentation for noise suppression.
arXiv Detail & Related papers (2022-03-22T13:40:26Z)
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation. We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.