Related papers: Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution

Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution

URL: http://arxiv.org/abs/2303.13767v2
Date: Wed, 29 Mar 2023 01:59:37 GMT
Title: Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution
Authors: Yunfan Lu, Zipeng Wang, Minjie Liu, Hongjian Wang, Lin Wang
Abstract summary: Event cameras sense the intensity changes asynchronously and produce event streams with high dynamic range and low latency. This has inspired research endeavors utilizing events to guide the challenging video superresolution (VSR) task. We make the first attempt to address a novel problem of achieving VSR at random scales by taking advantages of the high temporal resolution property of events.
Score: 9.431635577890745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Event cameras sense the intensity changes asynchronously and produce event streams with high dynamic range and low latency. This has inspired research endeavors utilizing events to guide the challenging video superresolution (VSR) task. In this paper, we make the first attempt to address a novel problem of achieving VSR at random scales by taking advantages of the high temporal resolution property of events. This is hampered by the difficulties of representing the spatial-temporal information of events when guiding VSR. To this end, we propose a novel framework that incorporates the spatial-temporal interpolation of events to VSR in a unified framework. Our key idea is to learn implicit neural representations from queried spatial-temporal coordinates and features from both RGB frames and events. Our method contains three parts. Specifically, the Spatial-Temporal Fusion (STF) module first learns the 3D features from events and RGB frames. Then, the Temporal Filter (TF) module unlocks more explicit motion information from the events near the queried timestamp and generates the 2D features. Lastly, the SpatialTemporal Implicit Representation (STIR) module recovers the SR frame in arbitrary resolutions from the outputs of these two modules. In addition, we collect a real-world dataset with spatially aligned events and RGB frames. Extensive experiments show that our method significantly surpasses the prior-arts and achieves VSR with random scales, e.g., 6.5. Code and dataset are available at https: //vlis2022.github.io/cvpr23/egvsr.

Related papers

Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation [8.76832497215149]
Event cameras capture motion dynamics, offering a unique modality with great potential in various computer vision tasks.<n> RGB-Event fusion faces three misalignments: (i) temporal, (ii) temporal, and (iii) modal misalignment.<n>We propose Motion-enhanced Event (MET), which transforms sparse event voxels into a dense and temporally coherent form.
arXiv Detail & Related papers (2025-05-02T19:19:58Z)
Event-Enhanced Blurry Video Super-Resolution [52.894824081586776]
We tackle the task of blurry video super-resolution (BVSR), aiming to generate high-resolution (HR) videos from low-resolution (LR) and blurry inputs. Current BVSR methods often fail to restore sharp details at high resolutions, resulting in noticeable artifacts and jitter. We introduce event signals into BVSR and propose a novel event-enhanced network, Ev-DeVSR.
arXiv Detail & Related papers (2025-04-17T15:55:41Z)
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework [30.734382771657312]
We propose a novel CM3AE pre-training framework for the RGB-Event perception. This framework accepts multi-modalities/views of data as input, including RGB images, event images, and event voxels. We construct a large-scale dataset containing 2,535,759 RGB-Event data pairs for the pre-training.
arXiv Detail & Related papers (2025-04-17T01:49:46Z)
Path-adaptive Spatio-Temporal State Space Model for Event-based Recognition with Arbitrary Duration [9.547947845734992]
Event cameras are bio-inspired sensors that capture the intensity changes asynchronously and output event streams. We present a novel framework, dubbed PAST-Act, exhibiting superior capacity in recognizing events with arbitrary duration. We also build a minute-level event-based recognition dataset, named ArDVS100, with arbitrary duration for the benefit of the community.
arXiv Detail & Related papers (2024-09-25T14:08:37Z)
HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera [22.208120663778043]
Continuous space-time super-resolution (C-STVSR) aims to simultaneously enhance resolution and frame rate at an arbitrary scale. We propose a novel C-STVSR framework, called HR-INR, which captures both holistic dependencies and regional motions based on implicit neural representation (INR) We then propose a novel INR-based decoder withtemporal embeddings to capture long-term dependencies with a larger temporal perception field.
arXiv Detail & Related papers (2024-05-22T06:51:32Z)
CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras [43.699819213559515]
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and their resolution ($346 times 260$) is low for practical applications. We build the first unaligned frame-event dataset CRSOT collected with a specially built data acquisition system. We propose a novel unaligned object tracking framework that can realize robust tracking even using the loosely aligned RGB-Event data.
arXiv Detail & Related papers (2024-01-05T14:20:22Z)
Implicit Event-RGBD Neural SLAM [54.74363487009845]
Implicit neural SLAM has achieved remarkable progress recently. Existing methods face significant challenges in non-ideal scenarios. We propose EN-SLAM, the first event-RGBD implicit neural SLAM framework.
arXiv Detail & Related papers (2023-11-18T08:48:58Z)
Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals. We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z)
Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z)
Learning to Super-Resolve Blurry Images with Events [62.61911224564196]
Super-Resolution from a single motion Blurred image (SRB) is a severely ill-posed problem due to the joint degradation of motion blurs and low spatial resolution. We employ events to alleviate the burden of SRB and propose an Event-enhanced SRB (E-SRB) algorithm. We show that the proposed eSL-Net++ outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-27T13:46:42Z)
Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes. We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature. We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z)
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation. We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z)
Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video Super-Resolution [4.9136996406481135]
Video super-resolution (VSR) aims to estimate a high-resolution (HR) frame from a low-resolution (LR) frames. Key challenge for VSR lies in the effective exploitation of spatial correlation in an intra-frame and temporal dependency between consecutive frames.
arXiv Detail & Related papers (2021-06-14T06:36:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.