Related papers: Luganda Speech Intent Recognition for IoT Applications

Luganda Speech Intent Recognition for IoT Applications

URL: http://arxiv.org/abs/2405.19343v1
Date: Thu, 16 May 2024 10:14:00 GMT
Title: Luganda Speech Intent Recognition for IoT Applications
Authors: Andrew Katumba, Sudi Murindanyi, John Trevor Kasule, Elvis Mugume,
Abstract summary: This research project aimed to develop a Luganda speech intent classification system for IoT applications. The project uses hardware components such as Raspberry Pi, Wio Terminal, and ESP32 nodes as microcontrollers. The ultimate objective of this work was to enable voice control using Luganda, which was accomplished through a natural language processing (NLP) model deployed on the Raspberry Pi.
Score: 0.3374875022248865
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advent of Internet of Things (IoT) technology has generated massive interest in voice-controlled smart homes. While many voice-controlled smart home systems are designed to understand and support widely spoken languages like English, speakers of low-resource languages like Luganda may need more support. This research project aimed to develop a Luganda speech intent classification system for IoT applications to integrate local languages into smart home environments. The project uses hardware components such as Raspberry Pi, Wio Terminal, and ESP32 nodes as microcontrollers. The Raspberry Pi processes Luganda voice commands, the Wio Terminal is a display device, and the ESP32 nodes control the IoT devices. The ultimate objective of this work was to enable voice control using Luganda, which was accomplished through a natural language processing (NLP) model deployed on the Raspberry Pi. The NLP model utilized Mel Frequency Cepstral Coefficients (MFCCs) as acoustic features and a Convolutional Neural Network (Conv2D) architecture for speech intent classification. A dataset of Luganda voice commands was curated for this purpose and this has been made open-source. This work addresses the localization challenges and linguistic diversity in IoT applications by incorporating Luganda voice commands, enabling users to interact with smart home devices without English proficiency, especially in regions where local languages are predominant.

Related papers

Hello Afrika: Speech Commands in Kinyarwanda [0.0]
There is a dearth of speech command models for African languages.<n>Hello Afrika aims to address this issue and its first iteration is focused on the Kinyarwanda language.<n>The model was built off a custom speech command corpus made up of general directives, numbers, and a wake word.
arXiv Detail & Related papers (2025-06-16T16:30:19Z)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction. Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese. We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z)
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs [63.8261207950923]
FunAudioLLM is a model family designed to enhance natural voice interactions between humans and large language models (LLMs) At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub.
arXiv Detail & Related papers (2024-07-04T16:49:02Z)
Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%. In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z)
Plug-and-Play Multilingual Few-shot Spoken Words Recognition [3.591566487849146]
We propose PLiX, a multilingual and plug-and-play keyword spotting system. Our few-shot deep models are learned with millions of one-second audio clips across 20 languages. We show that PLiX can generalize to novel spoken words given as few as just one support example.
arXiv Detail & Related papers (2023-05-03T18:58:14Z)
Implementation Of Tiny Machine Learning Models On Arduino 33 BLE For Gesture And Speech Recognition [6.8324958655038195]
Here in the implementation of hand gesture recognition, TinyML model is trained and deployed from EdgeImpulse framework for hand gesture recognition. In the implementation of speech recognition, TinyML model is trained and deployed from EdgeImpulse framework for speech recognition. Arduino Nano 33 BLE device having built-in microphone can make an RGB LED glow like red, green or blue based on keyword pronounced.
arXiv Detail & Related papers (2022-07-23T10:53:26Z)
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information. Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z)
Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language. We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z)
Romanian Speech Recognition Experiments from the ROBIN Project [0.21485350418225244]
This paper presents different speech recognition experiments with deep neural networks focusing on producing fast (under 100ms latency from the network itself) Even though one of the key desired characteristics is low latency, the final deep neural network model achieves state of the art results for recognizing Romanian language.
arXiv Detail & Related papers (2021-11-23T17:35:00Z)
Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification. We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z)
Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion [13.543705472805431]
We present a single end-to-end trained neural G2P model that shares same encoder and decoder across multiple languages. We show 7.2% average improvement in phoneme error rate over low resource languages and no over high resource ones compared to monolingual baselines.
arXiv Detail & Related papers (2020-06-25T06:16:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.