Finding Visual Saliency in Continuous Spike Stream
- URL: http://arxiv.org/abs/2403.06233v1
- Date: Sun, 10 Mar 2024 15:15:35 GMT
- Title: Finding Visual Saliency in Continuous Spike Stream
- Authors: Lin Zhu, Xianzhang Chen, Xiao Wang, Hua Huang
- Abstract summary: In this paper, we investigate the visual saliency in the continuous spike stream for the first time.
We propose a Recurrent Spiking Transformer framework, which is based on a full spiking neural network.
Our framework exhibits a substantial margin of improvement in highlighting and capturing visual saliency in the spike stream.
- Score: 23.591309376586835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a bio-inspired vision sensor, the spike camera emulates the operational
principles of the fovea, a compact retinal region, by employing spike
discharges to encode the accumulation of per-pixel luminance intensity.
Leveraging its high temporal resolution and bio-inspired neuromorphic design,
the spike camera holds significant promise for advancing computer vision
applications. Saliency detection mimics the behavior of human beings and
captures the most salient region from the scenes. In this paper, we investigate
the visual saliency in the continuous spike stream for the first time. To
effectively process the binary spike stream, we propose a Recurrent Spiking
Transformer (RST) framework, which is based on a full spiking neural network.
Our framework enables the extraction of spatio-temporal features from the
continuous spatio-temporal spike stream while maintaining low power
consumption. To facilitate the training and validation of our proposed model,
we build a comprehensive real-world spike-based visual saliency dataset,
enriched with numerous light conditions. Extensive experiments demonstrate the
superior performance of our Recurrent Spiking Transformer framework in
comparison to other spike neural network-based methods. Our framework exhibits
a substantial margin of improvement in capturing and highlighting visual
saliency in the spike stream, which not only provides a new perspective for
spike-based saliency segmentation but also shows a new paradigm for full
SNN-based transformer models. The code and dataset are available at
\url{https://github.com/BIT-Vision/SVS}.
Related papers
- D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.
Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream [26.165424006344267]
Spike cameras offer distinct advantages over standard cameras.
Existing approaches reliant on spike cameras often assume optimal illumination.
We introduce SpikeNeRF, the first work that derives a NeRF-based volumetric scene representation from spike camera data.
arXiv Detail & Related papers (2024-03-17T13:51:25Z) - SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams [44.02794438687478]
Spike cameras have proven effective in capturing motion features and beneficial for solving this ill-posed problem.
Existing methods fall into the supervised learning paradigm, which suffers from notable performance degradation when applied to real-world scenarios.
We propose the first self-supervised framework for the task of spike-guided motion deblurring.
arXiv Detail & Related papers (2024-03-14T15:29:09Z) - Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven
Neural Networks [2.762909189433944]
In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems.
Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture.
Our research underscores the potency of our neural network to work with temporally consecutive encoded images for precise gaze vector predictions in challenging low-light videos, contributing to the advancement of gaze prediction technologies.
arXiv Detail & Related papers (2024-03-05T12:18:12Z) - LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [52.131996528655094]
We present the Long-term Effective Any Point Tracking (LEAP) module.
LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation.
Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - Recurrent Spike-based Image Restoration under General Illumination [21.630646894529065]
Spike camera is a new type of bio-inspired vision sensor that records light intensity in the form of a spike array with high temporal resolution (20,000 Hz)
Existing spike-based approaches typically assume that the scenes are with sufficient light intensity, which is usually unavailable in many real-world scenarios such as rainy days or dusk scenes.
We propose a Recurrent Spike-based Image Restoration (RSIR) network, which is the first work towards restoring clear images from spike arrays under general illumination.
arXiv Detail & Related papers (2023-08-06T04:24:28Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - SCFlow: Optical Flow Estimation for Spiking Camera [50.770803466875364]
Spiking camera has enormous potential in real applications, especially for motion estimation in high-speed scenes.
Optical flow estimation has achieved remarkable success in image-based and event-based vision, but % existing methods cannot be directly applied in spike stream from spiking camera.
This paper presents, SCFlow, a novel deep learning pipeline for optical flow estimation for spiking camera.
arXiv Detail & Related papers (2021-10-08T06:16:45Z) - Cascaded Deep Video Deblurring Using Temporal Sharpness Prior [88.98348546566675]
The proposed algorithm mainly consists of optical flow estimation from intermediate latent frames and latent frame restoration steps.
It first develops a deep CNN model to estimate optical flow from intermediate latent frames and then restores the latent frames based on the estimated optical flow.
We show that exploring the domain knowledge of video deblurring is able to make the deep CNN model more compact and efficient.
arXiv Detail & Related papers (2020-04-06T09:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.