SpikeGen: Generative Framework for Visual Spike Stream Processing
- URL: http://arxiv.org/abs/2505.18049v1
- Date: Fri, 23 May 2025 15:54:11 GMT
- Title: SpikeGen: Generative Framework for Visual Spike Stream Processing
- Authors: Gaole Dai, Menghang Dong, Rongyu Zhang, Ruichuan An, Shanghang Zhang, Tiejun Huang,
- Abstract summary: We introduce a robust generative processing framework named SpikeGen, designed for visual spike streams captured by spike cameras.<n>We evaluate this framework across multiple tasks involving mixed spike-RGB modalities, including conditional image/video deblurring, dense frame reconstruction from spike streams, and high-speed scene novel-view synthesis.
- Score: 35.33197152363442
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Neuromorphic Visual Systems, such as spike cameras, have attracted considerable attention due to their ability to capture clear textures under dynamic conditions. This capability effectively mitigates issues related to motion and aperture blur. However, in contrast to conventional RGB modalities that provide dense spatial information, these systems generate binary, spatially sparse frames as a trade-off for temporally rich visual streams. In this context, generative models emerge as a promising solution to address the inherent limitations of sparse data. These models not only facilitate the conditional fusion of existing information from both spike and RGB modalities but also enable the conditional generation based on latent priors. In this study, we introduce a robust generative processing framework named SpikeGen, designed for visual spike streams captured by spike cameras. We evaluate this framework across multiple tasks involving mixed spike-RGB modalities, including conditional image/video deblurring, dense frame reconstruction from spike streams, and high-speed scene novel-view synthesis. Supported by comprehensive experimental results, we demonstrate that leveraging the latent space operation abilities of generative models allows us to effectively address the sparsity of spatial information while fully exploiting the temporal richness of spike streams, thereby promoting a synergistic enhancement of different visual modalities.
Related papers
- VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling [68.65587507038539]
We present a novel video diffusion-enhanced 4D Gaussian Splatting framework for dynamic urban scene modeling.<n>Our key insight is to distill robust, temporally consistent priors from a test-time adapted video diffusion model.<n>Our method significantly enhances dynamic modeling, especially for fast-moving objects, achieving an approximate PSNR gain of 2 dB.
arXiv Detail & Related papers (2025-08-04T07:24:05Z) - THYME: Temporal Hierarchical-Cyclic Interactivity Modeling for Video Scene Graphs in Aerial Footage [11.587822611656648]
We introduce the Temporal Hierarchical Cyclic Scene Graph (THYME) approach, which integrates hierarchical feature aggregation with cyclic temporal refinement to address limitations.<n>THYME effectively models multi-scale spatial context and enforces temporal consistency across frames, yielding more accurate and coherent scene graphs.<n>In addition, we present AeroEye-v1.0, a novel aerial video dataset enriched with five types of interactivity that overcomes the constraints of existing datasets.
arXiv Detail & Related papers (2025-07-12T08:43:38Z) - Exploring Representation-Aligned Latent Space for Better Generation [86.45670422239317]
We introduce ReaLS, which integrates semantic priors to improve generation performance.<n>We show that fundamental DiT and SiT trained on ReaLS can achieve a 15% improvement in FID metric.<n>The enhanced semantic latent space enables more perceptual downstream tasks, such as segmentation and depth estimation.
arXiv Detail & Related papers (2025-02-01T07:42:12Z) - RepVideo: Rethinking Cross-Layer Representation for Video Generation [53.701548524818534]
We propose RepVideo, an enhanced representation framework for text-to-video diffusion models.<n>By accumulating features from neighboring layers to form enriched representations, this approach captures more stable semantic information.<n>Our experiments demonstrate that our RepVideo not only significantly enhances the ability to generate accurate spatial appearances, but also improves temporal consistency in video generation.
arXiv Detail & Related papers (2025-01-15T18:20:37Z) - Rethinking High-speed Image Reconstruction Framework with Spike Camera [48.627095354244204]
Spike cameras generate continuous spike streams to capture high-speed scenes with lower bandwidth and higher dynamic range than traditional RGB cameras.<n>We introduce a novel spike-to-image reconstruction framework SpikeCLIP that goes beyond traditional training paradigms.<n>Our experiments on real-world low-light datasets demonstrate that SpikeCLIP significantly enhances texture details and the luminance balance of recovered images.
arXiv Detail & Related papers (2025-01-08T13:00:17Z) - Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition [52.89441679581216]
Low-Light Video Enhancement (LLVE) seeks to restore dynamic or static scenes plagued by severe invisibility and noise.<n>We present an innovative video decomposition strategy that incorporates view-independent and view-dependent components.<n>Our framework consistently outperforms existing methods, establishing a new SOTA performance.
arXiv Detail & Related papers (2024-05-24T15:56:40Z) - SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams [44.02794438687478]
Spike cameras have proven effective in capturing motion features and beneficial for solving this ill-posed problem.
Existing methods fall into the supervised learning paradigm, which suffers from notable performance degradation when applied to real-world scenarios.
We propose the first self-supervised framework for the task of spike-guided motion deblurring.
arXiv Detail & Related papers (2024-03-14T15:29:09Z) - Finding Visual Saliency in Continuous Spike Stream [23.591309376586835]
In this paper, we investigate the visual saliency in the continuous spike stream for the first time.
We propose a Recurrent Spiking Transformer framework, which is based on a full spiking neural network.
Our framework exhibits a substantial margin of improvement in highlighting and capturing visual saliency in the spike stream.
arXiv Detail & Related papers (2024-03-10T15:15:35Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.