Related papers: SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams

SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams

URL: http://arxiv.org/abs/2407.15708v2
Date: Wed, 24 Jul 2024 16:55:08 GMT
Title: SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams
Authors: Liangyan Jiang, Chuang Zhu, Yanxu Chen,
Abstract summary: We introduce Swin Spikeformer (SwinSF), a novel model for dynamic scene reconstruction from spike streams. SwinSF combines shifted window self-attention and proposed temporal spike attention, ensuring a comprehensive feature extraction. We build a new synthesized dataset for spike image reconstruction which matches the resolution of the latest spike camera.
Score: 2.609896297570564
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The spike camera, with its high temporal resolution, low latency, and high dynamic range, addresses high-speed imaging challenges like motion blur. It captures photons at each pixel independently, creating binary spike streams rich in temporal information but challenging for image reconstruction. Current algorithms, both traditional and deep learning-based, still need to be improved in the utilization of the rich temporal detail and the restoration of the details of the reconstructed image. To overcome this, we introduce Swin Spikeformer (SwinSF), a novel model for dynamic scene reconstruction from spike streams. SwinSF is composed of Spike Feature Extraction, Spatial-Temporal Feature Extraction, and Final Reconstruction Module. It combines shifted window self-attention and proposed temporal spike attention, ensuring a comprehensive feature extraction that encapsulates both spatial and temporal dynamics, leading to a more robust and accurate reconstruction of spike streams. Furthermore, we build a new synthesized dataset for spike image reconstruction which matches the resolution of the latest spike camera, ensuring its relevance and applicability to the latest developments in spike camera imaging. Experimental results demonstrate that the proposed network SwinSF sets a new benchmark, achieving state-of-the-art performance across a series of datasets, including both real-world and synthesized data across various resolutions. Our codes and proposed dataset will be available soon.

Related papers

EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling [69.96729022219117]
When sound waves hit an object, they induce vibrations that produce high-frequency and subtle visual changes. Recent advances in event camera hardware show good potential for its application in visual sound recovery. We propose a novel pipeline for non-contact sound recovery, fully utilizing spatial-temporal information from the event stream.
arXiv Detail & Related papers (2025-04-03T08:51:17Z)
Rethinking High-speed Image Reconstruction Framework with Spike Camera [48.627095354244204]
Spike cameras generate continuous spike streams to capture high-speed scenes with lower bandwidth and higher dynamic range than traditional RGB cameras. We introduce a novel spike-to-image reconstruction framework SpikeCLIP that goes beyond traditional training paradigms. Our experiments on real-world low-light datasets demonstrate that SpikeCLIP significantly enhances texture details and the luminance balance of recovered images.
arXiv Detail & Related papers (2025-01-08T13:00:17Z)
High-speed and High-quality Vision Reconstruction of Spike Camera with Spike Stability Theorem [26.827138186323698]
We propose a new spike stability theorem that reveals the relationship between spike stream characteristics and stable light intensity. Based on the spike stability theorem, two parameter-free algorithms are designed for the real-time vision reconstruction of the spike camera. Our work provides new theorem and algorithm foundations for the real-time edge-end vision processing of the spike camera.
arXiv Detail & Related papers (2024-12-16T10:33:10Z)
Spike-NeRF: Neural Radiance Field Based On Spike Camera [24.829344089740303]
We propose Spike-NeRF, the first Neural Radiance Field derived from spike data. Instead of the multi-view images at the same time of NeRF, the inputs of Spike-NeRF are continuous spike streams captured by a moving spike camera in a very short time. Our results demonstrate that Spike-NeRF produces more visually appealing results than the existing methods and the baseline we proposed in high-speed scenes.
arXiv Detail & Related papers (2024-03-25T04:05:23Z)
SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams [44.02794438687478]
Spike cameras have proven effective in capturing motion features and beneficial for solving this ill-posed problem. Existing methods fall into the supervised learning paradigm, which suffers from notable performance degradation when applied to real-world scenarios. We propose the first self-supervised framework for the task of spike-guided motion deblurring.
arXiv Detail & Related papers (2024-03-14T15:29:09Z)
Finding Visual Saliency in Continuous Spike Stream [23.591309376586835]
In this paper, we investigate the visual saliency in the continuous spike stream for the first time. We propose a Recurrent Spiking Transformer framework, which is based on a full spiking neural network. Our framework exhibits a substantial margin of improvement in highlighting and capturing visual saliency in the spike stream.
arXiv Detail & Related papers (2024-03-10T15:15:35Z)
Learning to Robustly Reconstruct Low-light Dynamic Scenes from Spike Streams [28.258022350623023]
As a neuromorphic sensor, spike camera can generate continuous binary spike streams to capture per-pixel light intensity. We propose a bidirectional recurrent-based reconstruction framework, including a Light-Robust Representation (LR-Rep) and a fusion module. We have developed a reconstruction benchmark for high-speed low-light scenes.
arXiv Detail & Related papers (2024-01-19T03:01:07Z)
ReconFusion: 3D Reconstruction with Diffusion Priors [104.73604630145847]
We present ReconFusion to reconstruct real-world scenes using only a few photos. Our approach leverages a diffusion prior for novel view synthesis, trained on synthetic and multiview datasets. Our method synthesizes realistic geometry and texture in underconstrained regions while preserving the appearance of observed regions.
arXiv Detail & Related papers (2023-12-05T18:59:58Z)
Robust e-NeRF: NeRF from Sparse & Noisy Events under Non-Uniform Motion [67.15935067326662]
Event cameras offer low power, low latency, high temporal resolution and high dynamic range. NeRF is seen as the leading candidate for efficient and effective scene representation. We propose Robust e-NeRF, a novel method to directly and robustly reconstruct NeRFs from moving event cameras.
arXiv Detail & Related papers (2023-09-15T17:52:08Z)
Recurrent Spike-based Image Restoration under General Illumination [21.630646894529065]
Spike camera is a new type of bio-inspired vision sensor that records light intensity in the form of a spike array with high temporal resolution (20,000 Hz) Existing spike-based approaches typically assume that the scenes are with sufficient light intensity, which is usually unavailable in many real-world scenarios such as rainy days or dusk scenes. We propose a Recurrent Spike-based Image Restoration (RSIR) network, which is the first work towards restoring clear images from spike arrays under general illumination.
arXiv Detail & Related papers (2023-08-06T04:24:28Z)
Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals. We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z)
Recovering Continuous Scene Dynamics from A Single Blurry Image with Events [58.7185835546638]
An Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events. A dual attention transformer is proposed to efficiently leverage merits from both modalities. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps.
arXiv Detail & Related papers (2023-04-05T18:44:17Z)
Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes. We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature. We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z)
Spatio-Temporal Recurrent Networks for Event-Based Optical Flow Estimation [47.984368369734995]
We introduce a novel recurrent encoding-decoding neural network architecture for event-based optical flow estimation. The network is end-to-end trained with self-supervised learning on the Multi-Vehicle Stereo Event Camera dataset. We have shown that it outperforms all the existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-09-10T13:37:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.