Related papers: Temporal-Aware Spiking Transformer Hashing Based on 3D-DWT

Temporal-Aware Spiking Transformer Hashing Based on 3D-DWT

URL: http://arxiv.org/abs/2501.06786v1
Date: Sun, 12 Jan 2025 11:48:19 GMT
Title: Temporal-Aware Spiking Transformer Hashing Based on 3D-DWT
Authors: Zihao Mei, Jianhao Li, Bolin Zhang, Chong Wang, Lijun Guo, Guoqi Li, Jiangbo Qian,
Abstract summary: We propose a novel supervised hashing method named Spikinghash with a hierarchical lightweight structure.<n>Based on the binary characteristics of neural networks (SNNs), we first propose a novel supervised hashing method named Spikinghash with a hierarchical lightweight structure.<n> Experiments on multiple datasets demonstrate that Spikinghash can achieve state-of-the-art results with low energy consumption fewer parameters.
Score: 21.43756642033915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid growth of dynamic vision sensor (DVS) data, constructing a low-energy, efficient data retrieval system has become an urgent task. Hash learning is one of the most important retrieval technologies which can keep the distance between hash codes consistent with the distance between DVS data. As spiking neural networks (SNNs) can encode information through spikes, they demonstrate great potential in promoting energy efficiency. Based on the binary characteristics of SNNs, we first propose a novel supervised hashing method named Spikinghash with a hierarchical lightweight structure. Spiking WaveMixer (SWM) is deployed in shallow layers, utilizing a multilevel 3D discrete wavelet transform (3D-DWT) to decouple spatiotemporal features into various low-frequency and high frequency components, and then employing efficient spectral feature fusion. SWM can effectively capture the temporal dependencies and local spatial features. Spiking Self-Attention (SSA) is deployed in deeper layers to further extract global spatiotemporal information. We also design a hash layer utilizing binary characteristic of SNNs, which integrates information over multiple time steps to generate final hash codes. Furthermore, we propose a new dynamic soft similarity loss for SNNs, which utilizes membrane potentials to construct a learnable similarity matrix as soft labels to fully capture the similarity differences between classes and compensate information loss in SNNs, thereby improving retrieval performance. Experiments on multiple datasets demonstrate that Spikinghash can achieve state-of-the-art results with low energy consumption and fewer parameters.

Related papers

Spiking Neural Network Accelerator Architecture for Differential-Time Representation using Learned Encoding [0.3749861135832073]
Spiking Neural Networks (SNNs) have garnered attention over recent years due to their increased energy efficiency. Two important questions when implementing SNNs are how to best encode existing data into spike trains and how to efficiently process these spike trains in hardware. This paper addresses both of these problems by incorporating the encoding into the learning process, thus allowing the network to learn the spike encoding alongside the weights.
arXiv Detail & Related papers (2025-01-14T09:09:08Z)
Optimized CNNs for Rapid 3D Point Cloud Object Recognition [2.6462438855724826]
This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs)<n>Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data.<n>The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods.
arXiv Detail & Related papers (2024-12-03T21:42:30Z)
Efficient Spatio-Temporal Signal Recognition on Edge Devices Using PointLCA-Net [0.45609532372046985]
This paper presents an approach that combines PointNet's feature extraction with the in-memory computing capabilities and energy efficiency of neuromorphic systems fortemporal signal recognition. PointNet achieves high accuracy and significantly lower energy burden during both inference and training than comparable approaches.
arXiv Detail & Related papers (2024-11-21T20:48:40Z)
Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning [30.51005522218133]
We introduce a novel Spiking Tucker Fusion Transformer (STFT) for audio-visual zero-shot learning (ZSL) The STFT leverage the temporal and semantic information from different time steps to generate robust representations. We propose a global-local pooling (GLP) which combines the max and average pooling operations.
arXiv Detail & Related papers (2024-07-11T02:01:26Z)
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Can LSH (Locality-Sensitive Hashing) Be Replaced by Neural Network? [9.940726521176499]
Recent progress shows that neural networks can partly replace traditional data structures. We propose a novel learning locality-sensitive hashing, called LLSH, to map high-dimensional data to low-dimensional space. The proposed LLSH demonstrate the feasibility of replacing the hash index with learning-based neural networks.
arXiv Detail & Related papers (2023-10-15T11:41:54Z)
Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture. To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy. Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z)
NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction. The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network. A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z)
Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale. Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous. We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z)
Spatial-Spectral Residual Network for Hyperspectral Image Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet) Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information. In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.