E2HQV: High-Quality Video Generation from Event Camera via
Theory-Inspired Model-Aided Deep Learning
- URL: http://arxiv.org/abs/2401.08117v1
- Date: Tue, 16 Jan 2024 05:10:50 GMT
- Title: E2HQV: High-Quality Video Generation from Event Camera via
Theory-Inspired Model-Aided Deep Learning
- Authors: Qiang Qu, Yiran Shen, Xiaoming Chen, Yuk Ying Chung, and Tongliang Liu
- Abstract summary: Bio-inspired event cameras or dynamic vision sensors are capable of capturing per-pixel brightness changes (called event-streams) in high temporal resolution and high dynamic range.
It calls for events-to-video (E2V) solutions which take event-streams as input and generate high quality video frames for intuitive visualization.
We propose textbfE2HQV, a novel E2V paradigm designed to produce high-quality video frames from events.
- Score: 53.63364311738552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The bio-inspired event cameras or dynamic vision sensors are capable of
asynchronously capturing per-pixel brightness changes (called event-streams) in
high temporal resolution and high dynamic range. However, the non-structural
spatial-temporal event-streams make it challenging for providing intuitive
visualization with rich semantic information for human vision. It calls for
events-to-video (E2V) solutions which take event-streams as input and generate
high quality video frames for intuitive visualization. However, current
solutions are predominantly data-driven without considering the prior knowledge
of the underlying statistics relating event-streams and video frames. It highly
relies on the non-linearity and generalization capability of the deep neural
networks, thus, is struggling on reconstructing detailed textures when the
scenes are complex. In this work, we propose \textbf{E2HQV}, a novel E2V
paradigm designed to produce high-quality video frames from events. This
approach leverages a model-aided deep learning framework, underpinned by a
theory-inspired E2V model, which is meticulously derived from the fundamental
imaging principles of event cameras. To deal with the issue of state-reset in
the recurrent components of E2HQV, we also design a temporal shift embedding
module to further improve the quality of the video frames. Comprehensive
evaluations on the real world event camera datasets validate our approach, with
E2HQV, notably outperforming state-of-the-art approaches, e.g., surpassing the
second best by over 40\% for some evaluation metrics.
Related papers
- LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction [8.163356555241322]
We propose a novel framework, called LaSe-E2V, that can achieve semantic-aware high-quality E2V reconstruction.
We first propose an Event-guided Spatiotemporal Attention (ESA) module to condition the event data to the denoising pipeline effectively.
We then introduce an event-aware mask loss to ensure temporal coherence and a noise strategy to enhance spatial consistency.
arXiv Detail & Related papers (2024-07-08T01:40:32Z) - Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z) - Inflation with Diffusion: Efficient Temporal Adaptation for
Text-to-Video Super-Resolution [19.748048455806305]
We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach.
We investigate different tuning approaches based on our inflated architecture and report trade-offs between computational costs and super-resolution quality.
arXiv Detail & Related papers (2024-01-18T22:25:16Z) - EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms
with Real-captured Hybrid Dataset [55.12137324648253]
Event cameras are emerging imaging technology that offers advantages over conventional frame-based imaging sensors in dynamic range and sensing speed.
This paper focuses on five event-aided image and video enhancement tasks.
arXiv Detail & Related papers (2023-12-13T15:42:04Z) - HyperE2VID: Improving Event-Based Video Reconstruction via Hypernetworks [16.432164340779266]
We propose HyperE2VID, a dynamic neural network architecture for event-based video reconstruction.
Our approach uses hypernetworks to generate per-pixel adaptive filters guided by a context fusion module.
arXiv Detail & Related papers (2023-05-10T18:00:06Z) - Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks [55.81577205593956]
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously.
Deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential.
arXiv Detail & Related papers (2023-02-17T14:19:28Z) - E2V-SDE: From Asynchronous Events to Fast and Continuous Video
Reconstruction via Neural Stochastic Differential Equations [23.866475611205736]
Event cameras respond to brightness changes in the scene asynchronously and independently for every pixel.
E2V-SDE can rapidly reconstruct images at arbitrary time steps and make realistic predictions on unseen data.
In terms of image quality, the LPIPS score improves by up to 12% and the reconstruction speed is 87% higher than that of ET-Net.
arXiv Detail & Related papers (2022-06-15T15:05:10Z) - Enhanced Quadratic Video Interpolation [56.54662568085176]
We propose an enhanced quadratic video (EQVI) model to handle more complicated scenes and motion patterns.
To further boost the performance, we devise a novel multi-scale fusion network (MS-Fusion) which can be regarded as a learnable augmentation process.
The proposed EQVI model won the first place in the AIM 2020 Video Temporal Super-Resolution Challenge.
arXiv Detail & Related papers (2020-09-10T02:31:50Z) - Reducing the Sim-to-Real Gap for Event Cameras [64.89183456212069]
Event cameras are paradigm-shifting novel sensors that report asynchronous, per-pixel brightness changes called 'events' with unparalleled low latency.
Recent work has demonstrated impressive results using Convolutional Neural Networks (CNNs) for video reconstruction and optic flow with events.
We present strategies for improving training data for event based CNNs that result in 20-40% boost in performance of existing video reconstruction networks.
arXiv Detail & Related papers (2020-03-20T02:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.