Morse Code-Enabled Speech Recognition for Individuals with Visual and Hearing Impairments
- URL: http://arxiv.org/abs/2407.14525v1
- Date: Sun, 7 Jul 2024 09:54:29 GMT
- Title: Morse Code-Enabled Speech Recognition for Individuals with Visual and Hearing Impairments
- Authors: Ritabrata Roy Choudhury,
- Abstract summary: The proposed model proposes the speech from the user, is transmitted to the speech recognition layer where it is converted into text.
The accuracy of the model is completely dependent on speech recognition, as the morse code conversion is a process.
The proposed model's WER and accuracy are both determined to be 10.18% and 89.82%, respectively.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proposed model aims to develop a speech recognition technology for hearing, speech, or cognitively disabled people. All the available technology in the field of speech recognition doesn't come with an interface for communication for people with hearing, speech, or cognitive disabilities. The proposed model proposes the speech from the user, is transmitted to the speech recognition layer where it is converted into text and then that text is then transmitted to the morse code conversion layer where the morse code of the corresponding speech is given as the output. The accuracy of the model is completely dependent on speech recognition, as the morse code conversion is a process. The model is tested with recorded audio files with different parameters. The proposed model's WER and accuracy are both determined to be 10.18% and 89.82%, respectively.
Related papers
- Towards General-Purpose Text-Instruction-Guided Voice Conversion [84.78206348045428]
This paper introduces a novel voice conversion model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice"
The proposed VC model is a neural language model which processes a sequence of discrete codes, resulting in the code sequence of converted speech.
arXiv Detail & Related papers (2023-09-25T17:52:09Z) - Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion [4.251500966181852]
This study consists of real human speech from eight well-known figures and their speech converted to one another using Retrieval-based Voice Conversion.
It is found that the Extreme Gradient Boosting model can achieve an average classification accuracy of 99.3% and can classify speech in real-time, at around 0.004 milliseconds given one second of speech.
arXiv Detail & Related papers (2023-08-24T12:26:15Z) - Speech Diarization and ASR with GMM [0.0]
Speech diarization involves the separation of individual speakers within an audio stream.
ASR entails the conversion of an unknown speech waveform into a corresponding written transcription.
Our primary objective typically revolves around developing a model that minimizes the Word Error Rate (WER) metric during speech transcription.
arXiv Detail & Related papers (2023-07-11T09:25:39Z) - Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment [1.0359008237358598]
Dysarthria is a disability that causes a disturbance in the human speech system.
We introduce gammatonegram as an effective method to represent audio files with discriminative details.
We convert each speech file into an image and propose image recognition system to classify speech in different scenarios.
arXiv Detail & Related papers (2023-07-06T21:10:50Z) - SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data [100.46303484627045]
We propose a cross-modal Speech and Language Model (SpeechLM) to align speech and text pre-training with a pre-defined unified representation.
Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities.
We evaluate SpeechLM on various spoken language processing tasks including speech recognition, speech translation, and universal representation evaluation framework SUPERB.
arXiv Detail & Related papers (2022-09-30T09:12:10Z) - Self-Supervised Speech Representations Preserve Speech Characteristics
while Anonymizing Voices [15.136348385992047]
We train several voice conversion models using self-supervised speech representations.
Converted voices retain a low word error rate within 1% of the original voice.
Experiments on dysarthric speech data show that speech features relevant to articulation, prosody, phonation and phonology can be extracted from anonymized voices.
arXiv Detail & Related papers (2022-04-04T17:48:01Z) - Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
by Re-Synthesis [67.73554826428762]
We propose a novel audio-visual speech enhancement framework for high-fidelity telecommunications in AR/VR.
Our approach leverages audio-visual speech cues to generate the codes of a neural speech, enabling efficient synthesis of clean, realistic speech from noisy signals.
arXiv Detail & Related papers (2022-03-31T17:57:10Z) - Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs
for Robust Speech Recognition [52.71604809100364]
We propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech.
Specifically, we feed original-noisy speech pairs simultaneously into the wav2vec 2.0 network.
In addition to the existing contrastive learning task, we switch the quantized representations of the original and noisy speech as additional prediction targets.
arXiv Detail & Related papers (2021-10-11T00:08:48Z) - On Prosody Modeling for ASR+TTS based Voice Conversion [82.65378387724641]
In voice conversion, an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic contents.
Such a paradigm, referred to as ASR+TTS, overlooks the modeling of prosody, which plays an important role in speech naturalness and conversion similarity.
We propose to directly predict prosody from the linguistic representation in a target-speaker-dependent manner, referred to as target text prediction (TTP)
arXiv Detail & Related papers (2021-07-20T13:30:23Z) - Learning Explicit Prosody Models and Deep Speaker Embeddings for
Atypical Voice Conversion [60.808838088376675]
We propose a VC system with explicit prosodic modelling and deep speaker embedding learning.
A prosody corrector takes in phoneme embeddings to infer typical phoneme duration and pitch values.
A conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech.
arXiv Detail & Related papers (2020-11-03T13:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.