Event Based Time-Vectors for auditory features extraction: a
neuromorphic approach for low power audio recognition
- URL: http://arxiv.org/abs/2112.07011v1
- Date: Mon, 13 Dec 2021 21:08:04 GMT
- Title: Event Based Time-Vectors for auditory features extraction: a
neuromorphic approach for low power audio recognition
- Authors: Marco Rasetto, Juan P. Dominguez-Morales, Angel Jimenez-Fernandez and
Ryad Benosman
- Abstract summary: We present a neuromorphic architecture, capable of unsupervised auditory feature recognition.
We then validate the network on a subset of Google's Speech Commands dataset.
- Score: 4.206844212918807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years tremendous efforts have been done to advance the state of the
art for Natural Language Processing (NLP) and audio recognition. However, these
efforts often translated in increased power consumption and memory requirements
for bigger and more complex models. These solutions falls short of the
constraints of IoT devices which need low power, low memory efficient
computation, and therefore they fail to meet the growing demand of efficient
edge computing. Neuromorphic systems have proved to be excellent candidates for
low-power low-latency computation in a multitude of applications. For this
reason we present a neuromorphic architecture, capable of unsupervised auditory
feature recognition. We then validate the network on a subset of Google's
Speech Commands dataset.
Related papers
- sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection
with Spiking Neural Networks [51.516451451719654]
Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient.
This paper introduces a novel SNN-based Voice Activity Detection model, referred to as sVAD.
It provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms.
arXiv Detail & Related papers (2024-03-09T02:55:44Z) - Deep Photonic Reservoir Computer for Speech Recognition [49.1574468325115]
Speech recognition is a critical task in the field of artificial intelligence and has witnessed remarkable advancements.
Deep reservoir computing is energy efficient but exhibits limitations in performance when compared to more resource-intensive machine learning algorithms.
We propose a photonic-based deep reservoir computer and evaluate its effectiveness on different speech recognition tasks.
arXiv Detail & Related papers (2023-12-11T17:43:58Z) - Model Blending for Text Classification [0.15229257192293197]
We try reducing the complexity of state of the art LSTM models for natural language tasks such as text classification, by distilling their knowledge to CNN based models, thus reducing the inference time(or latency) during testing.
arXiv Detail & Related papers (2022-08-05T05:07:45Z) - Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern
Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive.
We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading.
We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference.
Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z) - Neural Architecture Search for Energy Efficient Always-on Audio Models [1.3846912186423144]
We present several changes to neural architecture searches (NAS) that improve the chance of success in practical situations.
We benchmark the performance of our search on real hardware, but since running thousands of tests with real hardware is difficult we use a random forest model to roughly predict the energy usage of a candidate network.
Our search, evaluated on a sound-event classification dataset based upon AudioSet, results in an order of magnitude less energy per inference and a much smaller memory footprint.
arXiv Detail & Related papers (2022-02-09T06:10:18Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - TinySpeech: Attention Condensers for Deep Speech Recognition Neural
Networks on Edge Devices [71.68436132514542]
We introduce the concept of attention condensers for building low-footprint, highly-efficient deep neural networks for on-device speech recognition on the edge.
To illustrate its efficacy, we introduce TinySpeech, low-precision deep neural networks tailored for on-device speech recognition.
arXiv Detail & Related papers (2020-08-10T16:34:52Z) - Resource-Efficient Speech Mask Estimation for Multi-Channel Speech
Enhancement [15.361841669377776]
We provide a resource-efficient approach for multi-channel speech enhancement based on Deep Neural Networks (DNNs)
In particular, we use reduced-precision DNNs for estimating a speech mask from noisy, multi-channel microphone observations.
In the extreme case of binary weights and reduced precision activations, a significant reduction of execution time and memory footprint is possible.
arXiv Detail & Related papers (2020-07-22T14:58:29Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.