Related papers: EvDiff: High Quality Video with an Event Camera

EvDiff: High Quality Video with an Event Camera

URL: http://arxiv.org/abs/2511.17492v1
Date: Fri, 21 Nov 2025 18:49:18 GMT
Title: EvDiff: High Quality Video with an Event Camera
Authors: Weilun Li, Lei Sun, Ruixi Gao, Qi Jiang, Yuqin Ma, Kaiwei Wang, Ming-Hsuan Yang, Luc Van Gool, Danda Pani Paudel,
Abstract summary: Reconstructing intensity images from events is a highly ill-posed task due to the inherent ambiguity of absolute brightness.<n>We propose EvDiff, an event-based diffusion model that follows a surrogate training framework to produce high-quality videos.
Score: 77.07279880903009
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As neuromorphic sensors, event cameras asynchronously record changes in brightness as streams of sparse events with the advantages of high temporal resolution and high dynamic range. Reconstructing intensity images from events is a highly ill-posed task due to the inherent ambiguity of absolute brightness. Early methods generally follow an end-to-end regression paradigm, directly mapping events to intensity frames in a deterministic manner. While effective to some extent, these approaches often yield perceptually inferior results and struggle to scale up in model capacity and training data. In this work, we propose EvDiff, an event-based diffusion model that follows a surrogate training framework to produce high-quality videos. To reduce the heavy computational cost of high-frame-rate video generation, we design an event-based diffusion model that performs only a single forward diffusion step, equipped with a temporally consistent EvEncoder. Furthermore, our novel Surrogate Training Framework eliminates the dependence on paired event-image datasets, allowing the model to leverage large-scale image datasets for higher capacity. The proposed EvDiff is capable of generating high-quality colorful videos solely from monochromatic event streams. Experiments on real-world datasets demonstrate that our method strikes a sweet spot between fidelity and realism, outperforming existing approaches on both pixel-level and perceptual metrics.

Related papers

UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models [67.24086328473437]
Event cameras excel at recording relative intensity changes rather than absolute intensity.<n>The resulting data streams suffer from a significant loss of spatial information and static texture details.<n>We address this limitation by leveraging a pre-trained video diffusion model to reconstruct high-fidelity video frames from sparse event data.
arXiv Detail & Related papers (2026-02-22T14:06:49Z)
FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring [33.809728459395785]
We introduce FideDiff, a novel single-step diffusion model designed for high-fidelity deblurring.<n>We reformulate motion deblurring as a diffusion-like process where each timestep represents a progressively blurred image.<n>By reconstructing training data with matched blur trajectories, the model learns temporal consistency, enabling accurate one-step deblurring.
arXiv Detail & Related papers (2025-10-02T03:44:45Z)
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [67.94300151774085]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z)
EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame Interpolation [16.22243283808375]
Event-Guided Video Diffusion Model (EGVD) is a novel framework that leverages the powerful priors of pre-trained stable video diffusion models.<n>Our approach features a Multi-modal Motion Condition Generator (MMCG) that effectively integrates RGB frames and event signals to guide the diffusion process.<n>Experiments on both real and simulated datasets demonstrate that EGVD significantly outperforms existing methods in handling large motion.
arXiv Detail & Related papers (2025-03-26T06:33:32Z)
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation [58.41979933166173]
EvAnimate is the first method leveraging event streams as robust and precise motion cues for conditional human image animation.<n>High-quality and temporally coherent animations are achieved through a dual-branch architecture.<n>Experiment results show EvAnimate achieves high temporal fidelity and robust performance in scenarios where traditional video-derived cues fall short.
arXiv Detail & Related papers (2025-03-24T11:05:41Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering [7.392798832833857]
Event cameras offer exceptional temporal resolution and a high dynamic range.<n>We introduce a method for using event camera data in novel view synthesis via Gaussian Splatting.
arXiv Detail & Related papers (2024-12-10T08:23:58Z)
E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors [44.430588804079555]
We introduce diffusion models to events-to-video reconstruction, achieving colorful, realistic, and perceptually superior video generation from achromatic events. Our approach can produce diverse, realistic frames with faithfulness to the given events.
arXiv Detail & Related papers (2024-07-11T07:10:58Z)
Event-based Continuous Color Video Decompression from Single Frames [36.4263932473053]
We present ContinuityCam, a novel approach to generate a continuous video from a single static RGB image and an event camera stream.<n>Our approach combines continuous long-range motion modeling with a neural synthesis model, enabling frame prediction at arbitrary times within the events.
arXiv Detail & Related papers (2023-11-30T18:59:23Z)
EventNeRF: Neural Radiance Fields from a Single Colour Event Camera [81.19234142730326]
This paper proposes the first approach for 3D-consistent, dense and novel view synthesis using just a single colour event stream as input. At its core is a neural radiance field trained entirely in a self-supervised manner from events while preserving the original resolution of the colour event channels. We evaluate our method qualitatively and numerically on several challenging synthetic and real scenes and show that it produces significantly denser and more visually appealing renderings.
arXiv Detail & Related papers (2022-06-23T17:59:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.