A review-based study on different Text-to-Speech technologies
- URL: http://arxiv.org/abs/2312.11563v1
- Date: Sun, 17 Dec 2023 20:07:23 GMT
- Title: A review-based study on different Text-to-Speech technologies
- Authors: Md. Jalal Uddin Chowdhury, Ashab Hussan
- Abstract summary: The paper examines the different TTS technologies available, including concatenative TTS, formant synthesis TTS, and statistical parametric TTS.
The study focuses on comparing the advantages and limitations of these technologies in terms of their naturalness of voice, the level of complexity of the system, and their suitability for different applications.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research paper presents a comprehensive review-based study on various
Text-to-Speech (TTS) technologies. TTS technology is an important aspect of
human-computer interaction, enabling machines to convert written text into
audible speech. The paper examines the different TTS technologies available,
including concatenative TTS, formant synthesis TTS, and statistical parametric
TTS. The study focuses on comparing the advantages and limitations of these
technologies in terms of their naturalness of voice, the level of complexity of
the system, and their suitability for different applications. In addition, the
paper explores the latest advancements in TTS technology, including neural TTS
and hybrid TTS. The findings of this research will provide valuable insights
for researchers, developers, and users who want to understand the different TTS
technologies and their suitability for specific applications.
Related papers
- Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey [8.476093391815766]
Text-to-speech (TTS) is a prominent research area that aims to generate natural-sounding human speech from text.
With the increasing industrial demand, TTS technologies have evolved beyond human-like speech to enabling controllable speech generation.
In this paper, we conduct a comprehensive survey of controllable TTS, covering approaches ranging from basic control techniques to methods utilizing natural language prompts.
arXiv Detail & Related papers (2024-12-09T15:50:25Z) - Text-To-Speech Synthesis In The Wild [76.71096751337888]
Text-to-speech (TTS) systems are traditionally trained using modest databases of studio-quality, prompted or read speech collected in benign acoustic environments such as anechoic rooms.
We introduce the TTS In the Wild (TITW) dataset, the result of a fully automated pipeline, applied to the VoxCeleb1 dataset commonly used for speaker recognition.
We show that a number of recent TTS models can be trained successfully using TITW-Easy, but that it remains extremely challenging to produce similar results using TITW-Hard.
arXiv Detail & Related papers (2024-09-13T10:58:55Z) - On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition [31.58289343561422]
We compare five different TTS decoder architectures in the scope of synthetic data generation to show the impact on CTC-based speech recognition training.
For data generation auto-regressive decoding performs better than non-autoregressive decoding, and propose an approach to quantify TTS generalization capabilities.
arXiv Detail & Related papers (2024-07-31T09:37:27Z) - A Survey of Text Style Transfer: Applications and Ethical Implications [4.749824105387292]
Text style transfer (TST) aims to control selected attributes of language use, such as politeness, formality, or sentiment, without altering the style-independent content of the text.
This paper presents a comprehensive review of TST applications that have been researched over the years, using both traditional linguistic approaches and more recent deep learning methods.
arXiv Detail & Related papers (2024-07-23T17:15:23Z) - Text to speech synthesis [0.27195102129095]
Text-to-speech synthesis (TTS) is a technology that converts written text into spoken words.
This abstract explores the key aspects of TTS synthesis, encompassing its underlying technologies, applications, and implications for various sectors.
arXiv Detail & Related papers (2024-01-25T02:13:45Z) - Translation-Enhanced Multilingual Text-to-Image Generation [61.41730893884428]
Research on text-to-image generation (TTI) still predominantly focuses on the English language.
In this work, we thus investigate multilingual TTI and the current potential of neural machine translation (NMT) to bootstrap mTTI systems.
We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework.
arXiv Detail & Related papers (2023-05-30T17:03:52Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - On the Interplay Between Sparsity, Naturalness, Intelligibility, and
Prosody in Speech Synthesis [102.80458458550999]
We investigate the tradeoffs between sparstiy and its subsequent effects on synthetic speech.
Our findings suggest that not only are end-to-end TTS models highly prunable, but also, perhaps surprisingly, pruned TTS models can produce synthetic speech with equal or higher naturalness and intelligibility.
arXiv Detail & Related papers (2021-10-04T02:03:28Z) - A Survey on Neural Speech Synthesis [110.39292386792555]
Text to speech (TTS) is a hot research topic in speech, language, and machine learning communities.
We conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends.
We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc.
arXiv Detail & Related papers (2021-06-29T16:50:51Z) - GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech
Synthesis [79.1885389845874]
Transformer-based end-to-end text-to-speech synthesis (TTS) is one of such successful implementations.
We propose a novel neural TTS model, denoted as GraphSpeech, that is formulated under graph neural network framework.
Experiments show that GraphSpeech consistently outperforms the Transformer TTS baseline in terms of spectrum and prosody rendering of utterances.
arXiv Detail & Related papers (2020-10-23T14:14:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.