Text to speech synthesis
- URL: http://arxiv.org/abs/2401.13891v1
- Date: Thu, 25 Jan 2024 02:13:45 GMT
- Title: Text to speech synthesis
- Authors: Harini s, Manoj G M
- Abstract summary: Text-to-speech synthesis (TTS) is a technology that converts written text into spoken words.
This abstract explores the key aspects of TTS synthesis, encompassing its underlying technologies, applications, and implications for various sectors.
- Score: 0.27195102129095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-speech (TTS) synthesis is a technology that converts written text
into spoken words, enabling a natural and accessible means of communication.
This abstract explores the key aspects of TTS synthesis, encompassing its
underlying technologies, applications, and implications for various sectors.
The technology utilizes advanced algorithms and linguistic models to convert
textual information into life like speech, allowing for enhanced user
experiences in diverse contexts such as accessibility tools, navigation
systems, and virtual assistants. The abstract delves into the challenges and
advancements in TTS synthesis, including considerations for naturalness,
multilingual support, and emotional expression in synthesized speech.
Related papers
- Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey [8.476093391815766]
Text-to-speech (TTS) is a prominent research area that aims to generate natural-sounding human speech from text.
With the increasing industrial demand, TTS technologies have evolved beyond human-like speech to enabling controllable speech generation.
In this paper, we conduct a comprehensive survey of controllable TTS, covering approaches ranging from basic control techniques to methods utilizing natural language prompts.
arXiv Detail & Related papers (2024-12-09T15:50:25Z) - Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis [3.8251125989631674]
We propose an end-to-end context-aware Text-to-Speech (TTS) synthesis system.
It derives the conveyed emotion from text input and synthesises audio that focuses on emotions and speaker features for natural and expressive speech.
Our system showcases competitive inference time performance when benchmarked against state-of-the-art TTS models.
arXiv Detail & Related papers (2024-10-24T23:18:02Z) - UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts [64.02363948840333]
UMETTS is a novel framework that leverages emotional cues from multiple modalities to generate highly expressive and emotionally resonant speech.
EP-Align employs contrastive learning to align emotional features across text, audio, and visual modalities, ensuring a coherent fusion of multimodal information.
EMI-TTS integrates the aligned emotional embeddings with state-of-the-art TTS models to synthesize speech that accurately reflects the intended emotions.
arXiv Detail & Related papers (2024-04-29T03:19:39Z) - A review-based study on different Text-to-Speech technologies [0.0]
The paper examines the different TTS technologies available, including concatenative TTS, formant synthesis TTS, and statistical parametric TTS.
The study focuses on comparing the advantages and limitations of these technologies in terms of their naturalness of voice, the level of complexity of the system, and their suitability for different applications.
arXiv Detail & Related papers (2023-12-17T20:07:23Z) - Visual-Aware Text-to-Speech [101.89332968344102]
We present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and visual feedback of the listener in face-to-face communication.
We devise a baseline model to fuse phoneme linguistic information and listener visual signals for speech synthesis.
arXiv Detail & Related papers (2023-06-21T05:11:39Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - Building African Voices [125.92214914982753]
This paper focuses on speech synthesis for low-resourced African languages.
We create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources.
We release the speech data, code, and trained voices for 12 African languages to support researchers and developers.
arXiv Detail & Related papers (2022-07-01T23:28:16Z) - A Survey on Neural Speech Synthesis [110.39292386792555]
Text to speech (TTS) is a hot research topic in speech, language, and machine learning communities.
We conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends.
We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc.
arXiv Detail & Related papers (2021-06-29T16:50:51Z) - Spoken Style Learning with Multi-modal Hierarchical Context Encoding for
Conversational Text-to-Speech Synthesis [59.27994987902646]
The study about learning spoken styles from historical conversations is still in its infancy.
Only the transcripts of the historical conversations are considered, which neglects the spoken styles in historical speeches.
We propose a spoken style learning approach with multi-modal hierarchical context encoding.
arXiv Detail & Related papers (2021-06-11T08:33:52Z) - Review of end-to-end speech synthesis technology based on deep learning [10.748200013505882]
Research focus is the deep learning-based end-to-end speech synthesis technology.
It mainly consists of three modules: text front-end, acoustic model, and vocoder.
This paper summarizes the open-source speech corpus of English, Chinese and other languages that can be used for speech synthesis tasks.
arXiv Detail & Related papers (2021-04-20T14:24:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.