Temporal Information Reconstruction and Non-Aligned Residual in Spiking Neural Networks for Speech Classification
- URL: http://arxiv.org/abs/2501.00348v1
- Date: Tue, 31 Dec 2024 08:52:40 GMT
- Title: Temporal Information Reconstruction and Non-Aligned Residual in Spiking Neural Networks for Speech Classification
- Authors: Qi Zhang, Huamin Wang, Hangchi Shen, Shukai Duan, Shiping Wen, Tingwen Huang,
- Abstract summary: Most models based on spiking neural networks (SNNs) only use a same level temporal resolution to deal with speech classification problems.
We propose a novel method named as Temporal Reconstruction (TR) by referring the hierarchical processing process of the human brain for understanding speech.
We also propose the Non-Aligned Residual (NAR) method by analyzing the audio data, which allows the residual connection can be used in two audio data with different time lengths.
- Score: 45.30468752468433
- License:
- Abstract: Recently, it can be noticed that most models based on spiking neural networks (SNNs) only use a same level temporal resolution to deal with speech classification problems, which makes these models cannot learn the information of input data at different temporal scales. Additionally, owing to the different time lengths of the data before and after the sub-modules of many models, the effective residual connections cannot be applied to optimize the training processes of these models.To solve these problems, on the one hand, we reconstruct the temporal dimension of the audio spectrum to propose a novel method named as Temporal Reconstruction (TR) by referring the hierarchical processing process of the human brain for understanding speech. Then, the reconstructed SNN model with TR can learn the information of input data at different temporal scales and model more comprehensive semantic information from audio data because it enables the networks to learn the information of input data at different temporal resolutions. On the other hand, we propose the Non-Aligned Residual (NAR) method by analyzing the audio data, which allows the residual connection can be used in two audio data with different time lengths. We have conducted plentiful experiments on the Spiking Speech Commands (SSC), the Spiking Heidelberg Digits (SHD), and the Google Speech Commands v0.02 (GSC) datasets. According to the experiment results, we have achieved the state-of-the-art (SOTA) result 81.02\% on SSC for the test classification accuracy of all SNN models, and we have obtained the SOTA result 96.04\% on SHD for the classification accuracy of all models.
Related papers
- Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks [3.2366933261812076]
Spiking Neural Networks (SNNs) are biologically-inspired deep neural networks that efficiently extract temporal information.
SNN model parameters are sensitive to temporal resolution, leading to significant performance drops when the temporal resolution of target data at the edge is not the same.
We propose three novel domain adaptation methods for adapting neuron parameters to account for the change in time resolution without re-training on target time-resolution.
arXiv Detail & Related papers (2024-11-07T14:58:51Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Few-shot Learning using Data Augmentation and Time-Frequency
Transformation for Time Series Classification [6.830148185797109]
We propose a novel few-shot learning framework through data augmentation.
We also develop a sequence-spectrogram neural network (SSNN)
Our methodology demonstrates its applicability of addressing the few-shot problems for time series classification.
arXiv Detail & Related papers (2023-11-06T15:32:50Z) - Continuous time recurrent neural networks: overview and application to
forecasting blood glucose in the intensive care unit [56.801856519460465]
Continuous time autoregressive recurrent neural networks (CTRNNs) are a deep learning model that account for irregular observations.
We demonstrate the application of these models to probabilistic forecasting of blood glucose in a critical care setting.
arXiv Detail & Related papers (2023-04-14T09:39:06Z) - Neural ODEs with Irregular and Noisy Data [8.349349605334316]
We discuss a methodology to learn differential equation(s) using noisy and irregular sampled measurements.
In our methodology, the main innovation can be seen in the integration of deep neural networks with the neural ordinary differential equations (ODEs) approach.
The proposed framework to learn a model describing the vector field is highly effective under noisy measurements.
arXiv Detail & Related papers (2022-05-19T11:24:41Z) - Visualising and Explaining Deep Learning Models for Speech Quality
Prediction [0.0]
The non-intrusive speech quality prediction model NISQA is analyzed in this paper.
It is composed of a convolutional neural network (CNN) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-12-12T12:50:03Z) - CARRNN: A Continuous Autoregressive Recurrent Neural Network for Deep
Representation Learning from Sporadic Temporal Data [1.8352113484137622]
In this paper, a novel deep learning-based model is developed for modeling multiple temporal features in sporadic data.
The proposed model, called CARRNN, uses a generalized discrete-time autoregressive model that is trainable end-to-end using neural networks modulated by time lags.
It is applied to multivariate time-series regression tasks using data provided for Alzheimer's disease progression modeling and intensive care unit (ICU) mortality rate prediction.
arXiv Detail & Related papers (2021-04-08T12:43:44Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.