Related papers: Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations

Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations

URL: http://arxiv.org/abs/2505.05056v1
Date: Thu, 08 May 2025 08:47:11 GMT
Title: Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Authors: Linrong Pan, Chenglong Jiang, Gaoze Hou, Ying Gao,
Abstract summary: This paper reports the construction of the Teochew-Wild, a speech corpus of the Teochew dialect.<n>The corpus includes 18.9 hours of in-the-wild Teochew speech data from multiple speakers.<n>To the best of our knowledge, this is the first publicly available Teochew dataset with accurate orthographic annotations.
Score: 2.4901756414164846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper reports the construction of the Teochew-Wild, a speech corpus of the Teochew dialect. The corpus includes 18.9 hours of in-the-wild Teochew speech data from multiple speakers, covering both formal and colloquial expressions, with precise orthographic and pinyin annotations. Additionally, we provide supplementary text processing tools and resources to propel research and applications in speech tasks for this low-resource language, such as automatic speech recognition (ASR) and text-to-speech (TTS). To the best of our knowledge, this is the first publicly available Teochew dataset with accurate orthographic annotations. We conduct experiments on the corpus, and the results validate its effectiveness in ASR and TTS tasks.

Related papers

RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations [15.198945496921914]
We introduce RASMALAI, a large-scale speech dataset with rich text descriptions.<n>We develop IndicParlerTTS, the first open-source, text-description-guided TTS for Indian languages.
arXiv Detail & Related papers (2025-05-24T09:16:14Z)
A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings [9.764748000637082]
Automatic detection of prominence at the word and syllable-levels is critical for building computer-assisted language learning systems.<n>It has been shown that prosody embeddings learned by the current state-of-the-art (SOTA) text-to-speech (TTS) systems could generate word- and syllable-level prominence in the synthesized speech as natural as in native speech.
arXiv Detail & Related papers (2024-12-11T10:58:14Z)
Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems [0.0]
This paper presents an overview of rule-based system for automatic accentuation and phonemic transcription of Russian texts. Two parts of the developed system, accentuation and transcription, use different approaches to achieve correct phonemic representations of input phrases. The developed toolkit is written in the Python language and is accessible on GitHub for any researcher interested.
arXiv Detail & Related papers (2024-10-03T14:43:43Z)
Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language. We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z)
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech [88.22544315633687]
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech systems. We propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy.
arXiv Detail & Related papers (2022-06-05T10:50:34Z)
Unified Speech-Text Pre-training for Speech Translation and Recognition [113.31415771943162]
We describe a method to jointly pre-train speech and text in an encoder-decoder modeling framework for speech translation and recognition. The proposed method incorporates four self-supervised and supervised subtasks for cross modality learning. It achieves between 1.7 and 2.3 BLEU improvement above the state of the art on the MuST-C speech translation dataset.
arXiv Detail & Related papers (2022-04-11T20:59:51Z)
Automatic Dialect Density Estimation for African American English [74.44807604000967]
We explore automatic prediction of dialect density of the African American English (AAE) dialect. dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect. We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database.
arXiv Detail & Related papers (2022-04-03T01:34:48Z)
What shall we do with an hour of data? Speech recognition for the un- and under-served languages of Common Voice [0.20774268785384567]
This report describes the methods and results of a three-week sprint to produce deployable speech recognition models for 31 under-served languages of the Common Voice project.
arXiv Detail & Related papers (2021-05-10T21:16:28Z)
Consecutive Decoding for Speech-to-text Translation [51.155661276936044]
COnSecutive Transcription and Translation (COSTT) is an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. Our method is verified on three mainstream datasets.
arXiv Detail & Related papers (2020-09-21T10:10:45Z)
"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text Translation [49.610188741500274]
An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language. Existing methods are limited by the amount of parallel corpus. We build a system to fully utilize signals in a parallel ST corpus.
arXiv Detail & Related papers (2020-09-21T09:19:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.