Spatially-guided Temporal Aggregation for Robust Event-RGB Optical Flow Estimation
- URL: http://arxiv.org/abs/2501.00838v1
- Date: Wed, 01 Jan 2025 13:40:09 GMT
- Title: Spatially-guided Temporal Aggregation for Robust Event-RGB Optical Flow Estimation
- Authors: Qianang Zhou, Junhui Hou, Meiyi Yang, Yongjian Deng, Youfu Li, Junlin Xiong,
- Abstract summary: Current optical flow methods exploit the stable appearance of frame (or RGB) data to establish robust correspondences across time.
Event cameras, on the other hand, provide high-temporal-resolution motion cues and excel in challenging scenarios.
This study introduces a novel approach that uses a spatially dense modality to guide the aggregation of the temporally dense event modality.
- Score: 47.75348821902489
- License:
- Abstract: Current optical flow methods exploit the stable appearance of frame (or RGB) data to establish robust correspondences across time. Event cameras, on the other hand, provide high-temporal-resolution motion cues and excel in challenging scenarios. These complementary characteristics underscore the potential of integrating frame and event data for optical flow estimation. However, most cross-modal approaches fail to fully utilize the complementary advantages, relying instead on simply stacking information. This study introduces a novel approach that uses a spatially dense modality to guide the aggregation of the temporally dense event modality, achieving effective cross-modal fusion. Specifically, we propose an event-enhanced frame representation that preserves the rich texture of frames and the basic structure of events. We use the enhanced representation as the guiding modality and employ events to capture temporally dense motion information. The robust motion features derived from the guiding modality direct the aggregation of motion information from events. To further enhance fusion, we propose a transformer-based module that complements sparse event motion features with spatially rich frame information and enhances global information propagation. Additionally, a mix-fusion encoder is designed to extract comprehensive spatiotemporal contextual features from both modalities. Extensive experiments on the MVSEC and DSEC-Flow datasets demonstrate the effectiveness of our framework. Leveraging the complementary strengths of frames and events, our method achieves leading performance on the DSEC-Flow dataset. Compared to the event-only model, frame guidance improves accuracy by 10\%. Furthermore, it outperforms the state-of-the-art fusion-based method with a 4\% accuracy gain and a 45\% reduction in inference time.
Related papers
- Event-based Motion Deblurring via Multi-Temporal Granularity Fusion [5.58706910566768]
Event camera, a bio-inspired sensor offering continuous visual information could enhance the deblurring performance.
Existing event-based image deblurring methods usually utilize voxel-based event representations.
We introduce point cloud-based event representation into the image deblurring task and propose a Multi-Temporal Granularity Network (MTGNet)
It combines the spatially dense but temporally coarse-grained voxel-based event representation and the temporally fine-grained but spatially sparse point cloud-based event.
arXiv Detail & Related papers (2024-12-16T15:20:54Z) - Event-Based Tracking Any Point with Motion-Augmented Temporal Consistency [58.719310295870024]
This paper presents an event-based framework for tracking any point.
It tackles the challenges posed by spatial sparsity and motion sensitivity in events.
It achieves 150% faster processing with competitive model parameters.
arXiv Detail & Related papers (2024-12-02T09:13:29Z) - Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking [32.86991031493605]
Event-based bionic camera captures dynamic scenes with high temporal resolution and high dynamic range.
We propose a dynamic event subframe splitting strategy to split the event stream into more fine-grained event clusters.
Based on this, we design an event-based sparse attention mechanism to enhance the interaction of event features in temporal and spatial dimensions.
arXiv Detail & Related papers (2024-09-26T06:12:08Z) - Secrets of Edge-Informed Contrast Maximization for Event-Based Vision [6.735928398631445]
Event cameras capture the motion of intensity gradients (edges) in the image plane in the form of rapid asynchronous events.
Contrast histogram (CM) is an optimization framework that can reverse this effect and produce sharp spatial structures.
We propose a novel hybrid approach that extends CM from uni-modal (events only) to bi-modal (events and edges)
arXiv Detail & Related papers (2024-09-22T22:22:26Z) - Event-based Video Frame Interpolation with Edge Guided Motion Refinement [28.331148083668857]
We introduce an end-to-end E-VFI learning method to efficiently utilize edge features from event signals for motion flow and warping enhancement.
Our method incorporates an Edge Guided Attentive (EGA) module, which rectifies estimated video motion through attentive aggregation.
Experiments on both synthetic and real datasets show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-04-28T12:13:34Z) - Revisiting Event-based Video Frame Interpolation [49.27404719898305]
Dynamic vision sensors or event cameras provide rich complementary information for video frame.
estimating optical flow from events is arguably more difficult than from RGB information.
We propose a divide-and-conquer strategy in which event-based intermediate frame synthesis happens incrementally in multiple simplified stages.
arXiv Detail & Related papers (2023-07-24T06:51:07Z) - A Unified Framework for Event-based Frame Interpolation with Ad-hoc Deblurring in the Wild [72.0226493284814]
We propose a unified framework for event-based frame that performs deblurring ad-hoc.
Our network consistently outperforms previous state-of-the-art methods on frame, single image deblurring, and the joint task of both.
arXiv Detail & Related papers (2023-01-12T18:19:00Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.