Conversion of Acoustic Signal (Speech) Into Text By Digital Filter using
Natural Language Processing
- URL: http://arxiv.org/abs/2209.04189v1
- Date: Fri, 9 Sep 2022 08:55:34 GMT
- Title: Conversion of Acoustic Signal (Speech) Into Text By Digital Filter using
Natural Language Processing
- Authors: Abhiram Katuri, Sindhu Salugu, Gelli Tharuni, Challa Sri Gouri
- Abstract summary: We create an interface that transforms speech and other auditory inputs into text using a digital filter.
It is also possible for linguistic faults to appear occasionally, gender recognition, speech recognition that is unsuccessful (cannot recognize voice) and gender recognition to fail.
Since technical problems are involved, we developed a program that acts as a mediator to prevent initiating software issues.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the most crucial aspects of communication in daily life is speech
recognition. Speech recognition that is based on natural language processing is
one of the essential elements in the conversion of one system to another. In
this paper, we created an interface that transforms speech and other auditory
inputs into text using a digital filter. Contrary to the many methods for this
conversion, it is also possible for linguistic faults to appear occasionally,
gender recognition, speech recognition that is unsuccessful (cannot recognize
voice), and gender recognition to fail. Since technical problems are involved,
we developed a program that acts as a mediator to prevent initiating software
issues in order to eliminate even this little deviation. Its planned MFCC and
HMM are in sync with its AI system. As a result, technical errors have been
avoided.
Related papers
- Discrete Unit based Masking for Improving Disentanglement in Voice Conversion [8.337649176647645]
We introduce a novel masking mechanism in the input before speaker encoding, masking certain discrete speech units that correspond highly with phoneme classes.
Our approach improves disentanglement and conversion performance across multiple VC methods, with 44% relative improvement in objective intelligibility.
arXiv Detail & Related papers (2024-09-17T21:17:59Z) - The evaluation of a code-switched Sepedi-English automatic speech
recognition system [0.0]
We present the evaluation of the Sepedi-English code-switched automatic speech recognition system.
This end-to-end system was developed using the Sepedi Prompted Code Switching corpus and the CTC approach.
The model produced the lowest WER of 41.9%, however, the model faced challenges in recognizing the Sepedi only text.
arXiv Detail & Related papers (2024-03-11T15:11:28Z) - Looking and Listening: Audio Guided Text Recognition [62.98768236858089]
Text recognition in the wild is a long-standing problem in computer vision.
Recent studies suggest vision and language processing are effective for scene text recognition.
Yet, solving edit errors such as add, delete, or replace is still the main challenge for existing approaches.
We propose the AudioOCR, a simple yet effective probabilistic audio decoder for mel spectrogram sequence prediction.
arXiv Detail & Related papers (2023-06-06T08:08:18Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - Speech Aware Dialog System Technology Challenge (DSTC11) [12.841429336655736]
Most research on task oriented dialog modeling is based on written text input.
We created three spoken versions of the popular written-domain MultiWoz task -- (a) TTS-Verbatim: written user inputs were converted into speech waveforms using a TTS system, (b) Human-Verbatim: humans spoke the user inputs verbatim, and (c) Human-paraphrased: humans paraphrased the user inputs.
arXiv Detail & Related papers (2022-12-16T20:30:33Z) - A unified one-shot prosody and speaker conversion system with
self-supervised discrete speech units [94.64927912924087]
Existing systems ignore the correlation between prosody and language content, leading to degradation of naturalness in converted speech.
We devise a cascaded modular system leveraging self-supervised discrete speech units as language representation.
Experiments show that our system outperforms previous approaches in naturalness, intelligibility, speaker transferability, and prosody transferability.
arXiv Detail & Related papers (2022-11-12T00:54:09Z) - DualVoice: Speech Interaction that Discriminates between Normal and
Whispered Voice Input [16.82591185507251]
There is no easy way to distinguish between commands being issued and text required to be input in speech.
The input of symbols and commands is also challenging because these may be misrecognized as text letters.
This study proposes a speech interaction method called DualVoice, by which commands can be input in a whispered voice and letters in a normal voice.
arXiv Detail & Related papers (2022-08-22T13:01:28Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration [62.75234183218897]
We propose a one-stage context-aware framework to generate natural and coherent target speech without any training data of the speaker.
We generate the mel-spectrogram of the edited speech with a transformer-based decoder.
It outperforms a recent zero-shot TTS engine by a large margin.
arXiv Detail & Related papers (2021-09-12T04:17:53Z) - Unsupervised Domain Adaptation in Speech Recognition using Phonetic
Features [6.872447420442981]
We propose a technique to perform unsupervised gender-based domain adaptation in speech recognition using phonetic features.
Experiments are performed on the TIMIT dataset and there is a considerable decrease in the phoneme error rate using the proposed approach.
arXiv Detail & Related papers (2021-08-04T06:22:12Z) - On Prosody Modeling for ASR+TTS based Voice Conversion [82.65378387724641]
In voice conversion, an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic contents.
Such a paradigm, referred to as ASR+TTS, overlooks the modeling of prosody, which plays an important role in speech naturalness and conversion similarity.
We propose to directly predict prosody from the linguistic representation in a target-speaker-dependent manner, referred to as target text prediction (TTP)
arXiv Detail & Related papers (2021-07-20T13:30:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.