Fast frequency discrimination and phoneme recognition using a biomimetic
membrane coupled to a neural network
- URL: http://arxiv.org/abs/2004.04459v1
- Date: Thu, 9 Apr 2020 10:07:12 GMT
- Title: Fast frequency discrimination and phoneme recognition using a biomimetic
membrane coupled to a neural network
- Authors: Woo Seok Lee, Hyunjae Kim, Andrew N. Cleland, and Kang-Hun Ahn
- Abstract summary: In the human ear, the basilar membrane plays a central role in sound recognition.
Inspired by this structure, we designed and fabricated an artificial membrane that produces a spatial displacement pattern in response to an audible signal.
When trained with single frequency tones, this system can unambiguously distinguish tones closely spaced in frequency.
- Score: 2.314552275307609
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the human ear, the basilar membrane plays a central role in sound
recognition. When excited by sound, this membrane responds with a
frequency-dependent displacement pattern that is detected and identified by the
auditory hair cells combined with the human neural system. Inspired by this
structure, we designed and fabricated an artificial membrane that produces a
spatial displacement pattern in response to an audible signal, which we used to
train a convolutional neural network (CNN). When trained with single frequency
tones, this system can unambiguously distinguish tones closely spaced in
frequency. When instead trained to recognize spoken vowels, this system
outperforms existing methods for phoneme recognition, including the discrete
Fourier transform (DFT), zoom FFT and chirp z-transform, especially when tested
in short time windows. This sound recognition scheme therefore promises
significant benefits in fast and accurate sound identification compared to
existing methods.
Related papers
- DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs [12.234206036041218]
We use the deep neural network (DNN) DeepSpeech2 as a paradigm to investigate how natural input and cochlear implant-based inputs are processed over time.
We generate naturalistic and cochlear implant-like inputs from spoken sentences and test the similarity of model performance to human performance.
We find that dynamics over time in each layer are affected by context as well as input type.
arXiv Detail & Related papers (2024-07-30T04:32:27Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - Surrogate Gradient Spiking Neural Networks as Encoders for Large
Vocabulary Continuous Speech Recognition [91.39701446828144]
We show that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method.
They have shown promising results on speech command recognition tasks.
In contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates.
arXiv Detail & Related papers (2022-12-01T12:36:26Z) - Classification of multi-frequency RF signals by extreme learning, using
magnetic tunnel junctions as neurons and synapses [46.000685134136525]
We show that magnetic tunnel junctions can process RF inputs with multiple frequencies in parallel.
Using a backpropagation-free method called extreme learning, we classify noisy images encoded by RF signals.
These results are a key step for embedded radiofrequency artificial intelligence.
arXiv Detail & Related papers (2022-11-02T14:09:42Z) - Deep Metric Learning with Locality Sensitive Angular Loss for
Self-Correcting Source Separation of Neural Spiking Signals [77.34726150561087]
We propose a methodology based on deep metric learning to address the need for automated post-hoc cleaning and robust separation filters.
We validate this method with an artificially corrupted label set based on source-separated high-density surface electromyography recordings.
This approach enables a neural network to learn to accurately decode neurophysiological time series using any imperfect method of labelling the signal.
arXiv Detail & Related papers (2021-10-13T21:51:56Z) - DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding [71.73405116189531]
We propose a neural vocoder that extracts F0 and timbre/aperiodicity encoding from the input speech that emulates those defined in conventional vocoders.
As the deep neural analyzer is learnable, it is expected to be more accurate for signal reconstruction and manipulation, and generalizable from speech to singing.
arXiv Detail & Related papers (2021-10-13T01:39:57Z) - Time-Frequency Analysis based Deep Interference Classification for
Frequency Hopping System [2.8123846032806035]
interference classification plays an important role in protecting the authorized communication system.
In this paper, the interference classification problem for the frequency hopping communication system is discussed.
Considering the possibility of presence multiple interferences in the frequency hopping system, the linear and bilinear transform based composite time-frequency analysis method is adopted.
arXiv Detail & Related papers (2021-07-21T14:22:40Z) - Voice Activity Detection for Transient Noisy Environment Based on
Diffusion Nets [13.558688470594674]
We address voice activity detection in acoustic environments of transients and stationary noises.
We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure.
A deep neural network is trained to separate speech from non-speech frames.
arXiv Detail & Related papers (2021-06-25T17:05:26Z) - PILOT: Introducing Transformers for Probabilistic Sound Event
Localization [107.78964411642401]
This paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms.
The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy.
arXiv Detail & Related papers (2021-06-07T18:29:19Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Robust Multi-channel Speech Recognition using Frequency Aligned Network [23.397670239950187]
We use frequency aligned network for robust automatic speech recognition.
We show that our multi-channel acoustic model with a frequency aligned network shows up to 18% relative reduction in word error rate.
arXiv Detail & Related papers (2020-02-06T21:47:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.