TinyML for Speech Recognition
- URL: http://arxiv.org/abs/2504.16213v1
- Date: Tue, 22 Apr 2025 19:00:40 GMT
- Title: TinyML for Speech Recognition
- Authors: Andrew Barovic, Armin Moin,
- Abstract summary: We train and deploy a quantized 1D convolutional neural network model to conduct speech recognition on an IoT edge device.<n>This can be useful in various Internet of Things (IoT) applications, such as smart homes and ambient assisted living for the elderly and people with disabilities.
- Score: 3.9134031118910264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We train and deploy a quantized 1D convolutional neural network model to conduct speech recognition on a highly resource-constrained IoT edge device. This can be useful in various Internet of Things (IoT) applications, such as smart homes and ambient assisted living for the elderly and people with disabilities, just to name a few examples. In this paper, we first create a new dataset with over one hour of audio data that enables our research and will be useful to future studies in this field. Second, we utilize the technologies provided by Edge Impulse to enhance our model's performance and achieve a high Accuracy of up to 97% on our dataset. For the validation, we implement our prototype using the Arduino Nano 33 BLE Sense microcontroller board. This microcontroller board is specifically designed for IoT and AI applications, making it an ideal choice for our target use case scenarios. While most existing research focuses on a limited set of keywords, our model can process 23 different keywords, enabling complex commands.
Related papers
- Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network [0.0]
This paper presents a lightweight 1D-Convolutional Neural Network (1D-CNN) designed to perform speaker identification on minimal datasets.
Our approach achieves a validation accuracy of 97.87%, leveraging data augmentation techniques to handle background noise and limited training samples.
arXiv Detail & Related papers (2024-11-22T17:18:08Z) - Leveraging Foundation Models for Zero-Shot IoT Sensing [5.319176383069102]
Deep learning models are increasingly deployed on edge Internet of Things (IoT) devices.
ZSL aims to classify data of unseen classes with the help of semantic information.
In this work, we align the IoT data embeddings with the semantic embeddings generated by an FM's text encoder for zero-shot IoT sensing.
arXiv Detail & Related papers (2024-07-29T11:16:48Z) - IoT-LM: Large Multisensory Language Models for the Internet of Things [70.74131118309967]
IoT ecosystem provides rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, and audio.
Machine learning presents a rich opportunity to automatically process IoT data at scale.
We introduce IoT-LM, an open-source large multisensory language model tailored for the IoT ecosystem.
arXiv Detail & Related papers (2024-07-13T08:20:37Z) - Realtime Person Identification via Gait Analysis [1.3260363717086592]
We propose a small CNN model with 4 layers that is very amenable for edge AI deployment and realtime gait recognition.
Our model achieves 96.7% accuracy and consumes only 5KB RAM with an inferencing time of 70 ms and 125mW power.
arXiv Detail & Related papers (2024-04-02T18:15:06Z) - MultiIoT: Benchmarking Machine Learning for the Internet of Things [70.74131118309967]
The next generation of machine learning systems must be adept at perceiving and interacting with the physical world.
sensory data from motion, thermal, geolocation, depth, wireless signals, video, and audio are increasingly used to model the states of physical environments.
Existing efforts are often specialized to a single sensory modality or prediction task.
This paper proposes MultiIoT, the most expansive and unified IoT benchmark to date, encompassing over 1.15 million samples from 12 modalities and 8 real-world tasks.
arXiv Detail & Related papers (2023-11-10T18:13:08Z) - LoRaWAN-enabled Smart Campus: The Dataset and a People Counter Use Case [9.835561936689357]
This paper presents a detailed description of the Smart Campus dataset based on LoRaWAN.
LoRaWAN is an emerging technology that enables serving hundreds of IoT devices.
arXiv Detail & Related papers (2023-04-26T08:14:56Z) - Knowledge Transfer For On-Device Speech Emotion Recognition with Neural
Structured Learning [19.220263739291685]
Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI)
We propose a neural structured learning (NSL) framework through building synthesized graphs.
Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance.
arXiv Detail & Related papers (2022-10-26T18:38:42Z) - Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern
Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive.
We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading.
We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference.
Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z) - Quantization and Deployment of Deep Neural Networks on Microcontrollers [0.0]
This work focuses on quantization and deployment of deep neural networks onto low-power 32-bit microcontrollers.
A new framework for end-to-end deep neural networks training, quantization and deployment is presented.
Execution using single precision 32-bit floating-point as well as fixed-point on 8- and 16-bit integers are supported.
arXiv Detail & Related papers (2021-05-27T17:39:06Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - TapNet: The Design, Training, Implementation, and Applications of a
Multi-Task Learning CNN for Off-Screen Mobile Input [75.05709030478073]
We present the design, training, implementation and applications of TapNet, a multi-task network that detects tapping on the smartphone.
TapNet can jointly learn from data across devices and simultaneously recognize multiple tap properties, including tap direction and tap location.
arXiv Detail & Related papers (2021-02-18T00:45:41Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.