Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow
- URL: http://arxiv.org/abs/2503.06992v2
- Date: Tue, 11 Mar 2025 11:32:11 GMT
- Title: Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow
- Authors: Hanyu Zhou, Haonan Wang, Haoyue Liu, Yuxing Duan, Yi Chang, Luxin Yan,
- Abstract summary: We propose a novel common modality fusion between frame and event modalities for high-dynamic scene optical flow.<n>In motion fusion, we discover that the frame-based motion possesses spatially dense but temporally discontinuous correlation, while the event-based motion has sparse but temporally continuous correlation.
- Score: 21.821959971338767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-dynamic scene optical flow is a challenging task, which suffers spatial blur and temporal discontinuous motion due to large displacement in frame imaging, thus deteriorating the spatiotemporal feature of optical flow. Typically, existing methods mainly introduce event camera to directly fuse the spatiotemporal features between the two modalities. However, this direct fusion is ineffective, since there exists a large gap due to the heterogeneous data representation between frame and event modalities. To address this issue, we explore a common-latent space as an intermediate bridge to mitigate the modality gap. In this work, we propose a novel common spatiotemporal fusion between frame and event modalities for high-dynamic scene optical flow, including visual boundary localization and motion correlation fusion. Specifically, in visual boundary localization, we figure out that frame and event share the similar spatiotemporal gradients, whose similarity distribution is consistent with the extracted boundary distribution. This motivates us to design the common spatiotemporal gradient to constrain the reference boundary localization. In motion correlation fusion, we discover that the frame-based motion possesses spatially dense but temporally discontinuous correlation, while the event-based motion has spatially sparse but temporally continuous correlation. This inspires us to use the reference boundary to guide the complementary motion knowledge fusion between the two modalities. Moreover, common spatiotemporal fusion can not only relieve the cross-modal feature discrepancy, but also make the fusion process interpretable for dense and continuous optical flow. Extensive experiments have been performed to verify the superiority of the proposed method.
Related papers
- Characterizing Motion Encoding in Video Diffusion Timesteps [50.13907856401258]
We study how motion is encoded in video diffusion timesteps by the trade-off between appearance editing and motion preservation.<n>We identify an early, motion-dominant regime and a later, appearance-dominant regime, yielding an operational motion-appearance boundary in timestep space.
arXiv Detail & Related papers (2025-12-18T21:20:54Z) - RainDiff: End-to-end Precipitation Nowcasting Via Token-wise Attention Diffusion [64.49056527678606]
We propose a Token-wise Attention integrated into not only the U-Net diffusion model but also the radar-temporal encoder.<n>Unlike prior approaches, our method integrates attention into the architecture without incurring the high resource cost typical of pixel-space diffusion.<n>Our experiments and evaluations demonstrate that the proposed method significantly outperforms state-of-the-art approaches, robustness local fidelity, generalization, and superior in complex precipitation forecasting scenarios.
arXiv Detail & Related papers (2025-10-16T17:59:13Z) - Injecting Frame-Event Complementary Fusion into Diffusion for Optical Flow in Challenging Scenes [41.822043262920296]
In degraded scenes, the frame camera provides dense appearance saturation but sparse boundary completeness due to its long imaging time and low dynamic range.<n>In contrast, the event camera offers sparse appearance saturation, while its short imaging time and high dynamic range gives rise to dense boundary completeness.<n>We propose a novel optical flow estimation framework Diff-ABFlow based on diffusion models with frame-event appearance-boundary fusion.
arXiv Detail & Related papers (2025-10-12T12:52:31Z) - STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene [54.418259038624406]
existing methods adopt unified representation model (e.g. Gaussian) to directly match scene from frame camera.<n>However, this unified paradigm fails in the potential temporal features of objects due to frame features and discontinuous spatial features between background and objects.<n>In this work, we introduce event camera to compensate for frame camera, and propose a distemporal-dentangle Gaussian splatting framework for high-dynamic scene reconstruction.
arXiv Detail & Related papers (2025-06-29T09:32:06Z) - LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs [55.81291976637705]
Large models (LMMs) excel in scene understanding but struggle with fine-temporal reasoning due to weak alignment between linguistic and visual representations.
Existing methods map textual positions and durations into the visual space from frame-based videos, but suffer from temporal sparsity that limits temporal coordination.
We introduce LFEA to leverage event cameras for temporally dense perception and frame-event fusion.
arXiv Detail & Related papers (2025-03-10T05:30:30Z) - Learning segmentation from point trajectories [79.02153797465326]
We present a way to train a segmentation network using long-term point trajectories as a supervisory signal to complement optical flow.<n>Our method outperforms the prior art on motion-based segmentation.
arXiv Detail & Related papers (2025-01-21T18:59:53Z) - Motion-Aware Generative Frame Interpolation [23.380470636851022]
Flow-based frame methods ensure motion stability through estimated intermediate flow but often introduce severe artifacts in complex motion regions.<n>Recent generative approaches, boosted by large-scale pre-trained video generation models, show promise in handling intricate scenes.<n>We propose Motion-aware Generative frame (MoG) that synergizes intermediate flow guidance with generative capacities to enhance fidelity.
arXiv Detail & Related papers (2025-01-07T11:03:43Z) - SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for Real-world Scene Flow [6.995663556921384]
Scene flow provides the 3D motion field of the first frame from two consecutive point clouds.
We propose a novel approach called Dual Cross Attentive (DCA) for the latent fusion and alignment between two frames based semantic contexts.
We leverage novel domain adaptive losses to effectively bridge the gap of motion inference from synthetic to real-world.
arXiv Detail & Related papers (2024-07-31T02:28:40Z) - Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity.
We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff)
Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z) - Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for
Scene Flow [17.23190429955172]
Single RGB or LiDAR is the mainstream sensor for the challenging scene flow.
Existing methods adopt a fusion strategy to directly fuse the cross-modal complementary knowledge in motion space.
We propose a novel hierarchical visual-motion fusion framework for scene flow.
arXiv Detail & Related papers (2024-03-12T09:15:19Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - Exploring the Common Appearance-Boundary Adaptation for Nighttime
Optical Flow [17.416185015412175]
We propose a novel appearance-boundary adaptation framework for nighttime optical flow.
In appearance adaptation, we embed the auxiliary daytime image and the nighttime image into a reflectance-aligned common space.
We find that motion of the two reflectance maps are very similar, benefiting us to consistently transfer motion appearance knowledge from daytime to nighttime domain.
arXiv Detail & Related papers (2024-01-31T07:51:52Z) - Forward Flow for Novel View Synthesis of Dynamic Scenes [97.97012116793964]
We propose a neural radiance field (NeRF) approach for novel view synthesis of dynamic scenes using forward warping.
Our method outperforms existing methods in both novel view rendering and motion modeling.
arXiv Detail & Related papers (2023-09-29T16:51:06Z) - Video Interpolation by Event-driven Anisotropic Adjustment of Optical
Flow [11.914613556594725]
We propose an end-to-end training method A2OF for video frame with event-driven Anisotropic Adjustment of Optical Flows.
Specifically, we use events to generate optical flow distribution masks for the intermediate optical flow, which can model the complicated motion between two frames.
arXiv Detail & Related papers (2022-08-19T02:31:33Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.