Related papers: ElectrodeNet -- A Deep Learning Based Sound Coding Strategy for Cochlear Implants

ElectrodeNet -- A Deep Learning Based Sound Coding Strategy for Cochlear Implants

URL: http://arxiv.org/abs/2305.16753v1
Date: Fri, 26 May 2023 09:06:04 GMT
Title: ElectrodeNet -- A Deep Learning Based Sound Coding Strategy for Cochlear Implants
Authors: Enoch Hsin-Ho Huang, Rong Chao, Yu Tsao, Chao-Min Wu
Abstract summary: ElectrodeNet is a deep learning based sound coding strategy for the cochlear implant (CI) The extended ElectrodeNet-CS strategy further incorporates the channel selection (CS) Network models of deep neural network (DNN), convolutional neural network (CNN) and long short-term memory (LSTM) were trained using the Fast Fourier Transformed bins and channel envelopes obtained from the processing of clean speech by the ACE strategy.
Score: 9.468136300919062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: ElectrodeNet, a deep learning based sound coding strategy for the cochlear implant (CI), is proposed to emulate the advanced combination encoder (ACE) strategy by replacing the conventional envelope detection using various artificial neural networks. The extended ElectrodeNet-CS strategy further incorporates the channel selection (CS). Network models of deep neural network (DNN), convolutional neural network (CNN), and long short-term memory (LSTM) were trained using the Fast Fourier Transformed bins and channel envelopes obtained from the processing of clean speech by the ACE strategy. Objective speech understanding using short-time objective intelligibility (STOI) and normalized covariance metric (NCM) was estimated for ElectrodeNet using CI simulations. Sentence recognition tests for vocoded Mandarin speech were conducted with normal-hearing listeners. DNN, CNN, and LSTM based ElectrodeNets exhibited strong correlations to ACE in objective and subjective scores using mean squared error (MSE), linear correlation coefficient (LCC) and Spearman's rank correlation coefficient (SRCC). The ElectrodeNet-CS strategy was capable of producing N-of-M compatible electrode patterns using a modified DNN network to embed maxima selection, and to perform in similar or even slightly higher average in STOI and sentence recognition compared to ACE. The methods and findings demonstrated the feasibility and potential of using deep learning in CI coding strategy.

Related papers

Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval. A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed. The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z)
Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN) [1.0237120900821557]
This study introduces the Cosine Convolutional Neural Network (CosCovNN) replacing the traditional CNN filters with Cosine filters. The CosCovNN surpasses the accuracy of the equivalent CNN architectures with approximately $77%$ less parameters. Our findings show that cosine filters can greatly improve the efficiency and accuracy of CNNs in raw audio classification.
arXiv Detail & Related papers (2024-11-30T01:39:16Z)
Multiscale fusion enhanced spiking neural network for invasive BCI neural signal decoding [13.108613110379961]
This paper presents a novel approach utilizing a Multiscale Fusion Fusion Spiking Neural Network (MFSNN) MFSNN emulates the parallel processing and multiscale feature fusion seen in human visual perception to enable real-time, efficient, and energy-conserving neural signal decoding. MFSNN surpasses traditional artificial neural network methods, such as enhanced GRU, in both accuracy and computational efficiency.
arXiv Detail & Related papers (2024-09-14T09:53:30Z)
Spiking Neural Network Decision Feedback Equalization [70.3497683558609]
We propose an SNN-based equalizer with a feedback structure akin to the decision feedback equalizer (DFE) We show that our approach clearly outperforms conventional linear equalizers for three different exemplary channels. The proposed SNN with a decision feedback structure enables the path to competitive energy-efficient transceivers.
arXiv Detail & Related papers (2022-11-09T09:19:15Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)
Convolutional Spiking Neural Networks for Detecting Anticipatory Brain Potentials Using Electroencephalogram [0.21847754147782888]
Spiking neural networks (SNNs) are receiving increased attention because they mimic synaptic connections in biological systems and produce spike trains. Recently, the addition of convolutional layers to combine the feature extraction power of convolutional networks with the computational efficiency of SNNs has been introduced. This paper studies the feasibility of using a convolutional spiking neural network (CSNN) to detect anticipatory slow cortical potentials.
arXiv Detail & Related papers (2022-08-14T19:04:15Z)
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM) Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z)
BioLCNet: Reward-modulated Locally Connected Spiking Neural Networks [0.6193838300896449]
We propose a spiking neural network (SNN) trained using spike-timing-dependent plasticity (STDP) and its reward-modulated variant (R-STDP) learning rules. Our network consists of a rate-coded input layer followed by a locally connected hidden layer and a decoding output layer. We used the MNIST dataset to obtain image classification accuracy and to assess the robustness of our rewarding system to varying target responses.
arXiv Detail & Related papers (2021-09-12T15:28:48Z)
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition [101.69873988328808]
We build upon a quantum convolutional neural network (QCNN) composed of a quantum circuit encoder for feature extraction. An input speech is first up-streamed to a quantum computing server to extract Mel-spectrogram. The corresponding convolutional features are encoded using a quantum circuit algorithm with random parameters. The encoded features are then down-streamed to the local RNN model for the final recognition.
arXiv Detail & Related papers (2020-10-26T03:36:01Z)
Attention Driven Fusion for Multi-Modal Emotion Recognition [39.295892047505816]
We present a deep learning-based approach to exploit and fuse text and acoustic data for emotion classification. We use a SincNet layer, based on parameterized sinc functions with band-pass filters, to extract acoustic features from raw audio followed by a DCNN. For text processing, we use two branches (a DCNN and a Bi-direction RNN followed by a DCNN) in parallel where cross attention is introduced to infer the N-gram level correlations.
arXiv Detail & Related papers (2020-09-23T08:07:58Z)
Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment. We implement this algorithm in a real-time robotic system with a microphone array. The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition [39.497407288772386]
recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research. In this work, we leverage external alignments to seed the RNN-T model. Two different pre-training solutions are explored, referred to as encoder pre-training, and whole-network pre-training respectively.
arXiv Detail & Related papers (2020-05-01T19:00:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.