EVDI++: Event-based Video Deblurring and Interpolation via Self-Supervised Learning
- URL: http://arxiv.org/abs/2509.08260v1
- Date: Wed, 10 Sep 2025 03:36:24 GMT
- Title: EVDI++: Event-based Video Deblurring and Interpolation via Self-Supervised Learning
- Authors: Chi Zhang, Xiang Zhang, Chenxu Jiang, Gui-Song Xia, Lei Yu,
- Abstract summary: We introduce EVDI++, a self-supervised framework for Event-based Video Deblurring and Interpolation.<n>We use the high temporal resolution of event cameras to mitigate motion blur and enable intermediate frame prediction.<n>A self-supervised learning framework is proposed to enable network training with real-world blurry videos and events.
- Score: 36.86635176661841
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Frame-based cameras with extended exposure times often produce perceptible visual blurring and information loss between frames, significantly degrading video quality. To address this challenge, we introduce EVDI++, a unified self-supervised framework for Event-based Video Deblurring and Interpolation that leverages the high temporal resolution of event cameras to mitigate motion blur and enable intermediate frame prediction. Specifically, the Learnable Double Integral (LDI) network is designed to estimate the mapping relation between reference frames and sharp latent images. Then, we refine the coarse results and optimize overall training efficiency by introducing a learning-based division reconstruction module, enabling images to be converted with varying exposure intervals. We devise an adaptive parameter-free fusion strategy to obtain the final results, utilizing the confidence embedded in the LDI outputs of concurrent events. A self-supervised learning framework is proposed to enable network training with real-world blurry videos and events by exploring the mutual constraints among blurry frames, latent images, and event streams. We further construct a dataset with real-world blurry images and events using a DAVIS346c camera, demonstrating the generalizability of the proposed EVDI++ in real-world scenarios. Extensive experiments on both synthetic and real-world datasets show that our method achieves state-of-the-art performance in video deblurring and interpolation tasks.
Related papers
- UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models [54.564740558030245]
We present UCM, a novel framework that unifies long-term memory and precise camera control via a time-aware positional encoding warping mechanism.<n>We also introduce a scalable data curation strategy utilizing point-cloud-based rendering to simulate scene revisiting.
arXiv Detail & Related papers (2026-02-26T12:54:46Z) - UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models [67.24086328473437]
Event cameras excel at recording relative intensity changes rather than absolute intensity.<n>The resulting data streams suffer from a significant loss of spatial information and static texture details.<n>We address this limitation by leveraging a pre-trained video diffusion model to reconstruct high-fidelity video frames from sparse event data.
arXiv Detail & Related papers (2026-02-22T14:06:49Z) - EvDiff: High Quality Video with an Event Camera [77.07279880903009]
Reconstructing intensity images from events is a highly ill-posed task due to the inherent ambiguity of absolute brightness.<n>We propose EvDiff, an event-based diffusion model that follows a surrogate training framework to produce high-quality videos.
arXiv Detail & Related papers (2025-11-21T18:49:18Z) - FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning [65.42201665046505]
Current video understanding models rely on fixed frame sampling strategies, processing predetermined visual inputs regardless of the specific reasoning requirements of each question.<n>This static approach limits their ability to adaptively gather visual evidence, leading to suboptimal performance on tasks that require broad temporal coverage or fine-grained spatial detail.<n>We introduce FrameMind, an end-to-end framework trained with reinforcement learning that enables models to dynamically request visual information during reasoning through Frame-Interleaved Chain-of-Thought (FiCOT)<n>Unlike traditional approaches, FrameMind operates in multiple turns where the model alternates between textual reasoning and active visual perception, using tools to extract
arXiv Detail & Related papers (2025-09-28T17:59:43Z) - CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring [44.30048301161034]
Video deblurring aims to enhance the quality of restored results in motion-red videos by gathering information from adjacent video frames.
We propose two modules: 1) Intra-frame feature enhancement operates within the exposure time of a single blurred frame, and 2) Inter-frame temporal feature alignment gathers valuable long-range temporal information to target frames.
We demonstrate that our proposed methods outperform state-of-the-art frame-based and event-based motion deblurring methods through extensive experiments conducted on both synthetic and real-world deblurring datasets.
arXiv Detail & Related papers (2024-08-27T10:09:17Z) - Event-based Video Frame Interpolation with Edge Guided Motion Refinement [28.331148083668857]
We introduce an end-to-end E-VFI learning method to efficiently utilize edge features from event signals for motion flow and warping enhancement.
Our method incorporates an Edge Guided Attentive (EGA) module, which rectifies estimated video motion through attentive aggregation.
Experiments on both synthetic and real datasets show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-04-28T12:13:34Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and
Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals.
We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z) - Joint Video Multi-Frame Interpolation and Deblurring under Unknown
Exposure Time [101.91824315554682]
In this work, we aim ambitiously for a more realistic and challenging task - joint video multi-frame and deblurring under unknown exposure time.
We first adopt a variant of supervised contrastive learning to construct an exposure-aware representation from input blurred frames.
We then build our video reconstruction network upon the exposure and motion representation by progressive exposure-adaptive convolution and motion refinement.
arXiv Detail & Related papers (2023-03-27T09:43:42Z) - A Unified Framework for Event-based Frame Interpolation with Ad-hoc Deblurring in the Wild [72.0226493284814]
We propose a unified framework for event-based frame that performs deblurring ad-hoc.<n>Our network consistently outperforms previous state-of-the-art methods on frame, single image deblurring, and the joint task of both.
arXiv Detail & Related papers (2023-01-12T18:19:00Z) - Unifying Motion Deblurring and Frame Interpolation with Events [11.173687810873433]
Slow shutter speed and long exposure time of frame-based cameras often cause visual blur and loss of inter-frame information, degenerating the overall quality of captured videos.
We present a unified framework of event-based motion deblurring and frame enhancement for blurry video enhancement, where the extremely low latency of events is leveraged to alleviate motion blur and facilitate intermediate frame prediction.
By exploring the mutual constraints among blurry frames, latent images, and event streams, we further propose a self-supervised learning framework to enable network training with real-world blurry videos and events.
arXiv Detail & Related papers (2022-03-23T03:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.