Corticomorphic Hybrid CNN-SNN Architecture for EEG-based Low-footprint
Low-latency Auditory Attention Detection
- URL: http://arxiv.org/abs/2307.08501v1
- Date: Thu, 13 Jul 2023 20:33:39 GMT
- Title: Corticomorphic Hybrid CNN-SNN Architecture for EEG-based Low-footprint
Low-latency Auditory Attention Detection
- Authors: Richard Gall, Deniz Kocanaogullari, Murat Akcakaya, Deniz Erdogmus,
Rajkumar Kubendran
- Abstract summary: In a multi-speaker "cocktail party" scenario, a listener can selectively attend to a speaker of interest.
Current trends in EEG-based auditory attention detection using artificial neural networks (ANN) are not practical for edge-computing platforms.
We propose a hybrid convolutional neural network-spiking neural network (CNN-SNN) architecture, inspired by the auditory cortex.
- Score: 8.549433398954738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a multi-speaker "cocktail party" scenario, a listener can selectively
attend to a speaker of interest. Studies into the human auditory attention
network demonstrate cortical entrainment to speech envelopes resulting in
highly correlated Electroencephalography (EEG) measurements. Current trends in
EEG-based auditory attention detection (AAD) using artificial neural networks
(ANN) are not practical for edge-computing platforms due to longer decision
windows using several EEG channels, with higher power consumption and larger
memory footprint requirements. Nor are ANNs capable of accurately modeling the
brain's top-down attention network since the cortical organization is complex
and layer. In this paper, we propose a hybrid convolutional neural
network-spiking neural network (CNN-SNN) corticomorphic architecture, inspired
by the auditory cortex, which uses EEG data along with multi-speaker speech
envelopes to successfully decode auditory attention with low latency down to 1
second, using only 8 EEG electrodes strategically placed close to the auditory
cortex, at a significantly higher accuracy of 91.03%, compared to the
state-of-the-art. Simultaneously, when compared to a traditional CNN reference
model, our model uses ~15% fewer parameters at a lower bit precision resulting
in ~57% memory footprint reduction. The results show great promise for
edge-computing in brain-embedded devices, like smart hearing aids.
Related papers
- sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection
with Spiking Neural Networks [51.516451451719654]
Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient.
This paper introduces a novel SNN-based Voice Activity Detection model, referred to as sVAD.
It provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms.
arXiv Detail & Related papers (2024-03-09T02:55:44Z) - Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks [53.31894108974566]
Spiking-LEAF is a learnable auditory front-end meticulously designed for SNN-based speech processing.
On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends.
arXiv Detail & Related papers (2023-09-18T04:03:05Z) - ElectrodeNet -- A Deep Learning Based Sound Coding Strategy for Cochlear
Implants [9.468136300919062]
ElectrodeNet is a deep learning based sound coding strategy for the cochlear implant (CI)
The extended ElectrodeNet-CS strategy further incorporates the channel selection (CS)
Network models of deep neural network (DNN), convolutional neural network (CNN) and long short-term memory (LSTM) were trained using the Fast Fourier Transformed bins and channel envelopes obtained from the processing of clean speech by the ACE strategy.
arXiv Detail & Related papers (2023-05-26T09:06:04Z) - Mental arithmetic task classification with convolutional neural network
based on spectral-temporal features from EEG [0.47248250311484113]
Deep neural networks (DNN) show significant advantages in computer vision applications.
We present here a shallow neural network that uses mainly two convolutional neural network layers, with relatively few parameters and fast to learn spectral-temporal features from EEG.
Experimental results showed that the shallow CNN model outperformed all the other models and achieved the highest classification accuracy of 90.68%.
arXiv Detail & Related papers (2022-09-26T02:15:22Z) - EEG-BBNet: a Hybrid Framework for Brain Biometric using Graph
Connectivity [1.1498015270151059]
We present EEG-BBNet, a hybrid network which integrates convolutional neural networks (CNN) with graph convolutional neural networks (GCNN)
Our models outperform all baselines in the event-related potential (ERP) task with an average correct recognition rates up to 99.26% using intra-session data.
arXiv Detail & Related papers (2022-08-17T10:18:22Z) - Convolutional Spiking Neural Networks for Detecting Anticipatory Brain Potentials Using Electroencephalogram [0.21847754147782888]
Spiking neural networks (SNNs) are receiving increased attention because they mimic synaptic connections in biological systems and produce spike trains.
Recently, the addition of convolutional layers to combine the feature extraction power of convolutional networks with the computational efficiency of SNNs has been introduced.
This paper studies the feasibility of using a convolutional spiking neural network (CSNN) to detect anticipatory slow cortical potentials.
arXiv Detail & Related papers (2022-08-14T19:04:15Z) - Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern
Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive.
We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading.
We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference.
Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z) - Extracting the Locus of Attention at a Cocktail Party from Single-Trial
EEG using a Joint CNN-LSTM Model [0.1529342790344802]
Human brain performs remarkably well in segregating a particular speaker from interfering speakers in a multi-speaker scenario.
We present a joint convolutional neural network (CNN) - long short-term memory (LSTM) model to infer the auditory attention.
arXiv Detail & Related papers (2021-02-08T01:06:48Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech.
Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times.
Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.