Related papers: Everyday Speech in the Indian Subcontinent

Related papers

ILID: Native Script Language Identification for Indian Languages [0.0]
Core challenge of language identification lies in distinguishing languages in noisy, short, and code-mixed environments.<n>We release a dataset of 250K sentences consisting of 23 languages including English and all 22 official Indian languages labeled with their language identifiers.<n>Our models outperforms the state-of-the-art pre-trained transformer models for the language identification task.
arXiv Detail & Related papers (2025-07-16T01:39:32Z)
Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages [6.74683227658822]
India has 1369 languages, with 22 official using 13 scripts.<n>Our work focuses on zero-shot synthesis, particularly for languages whose scripts and phonotactics come from different families.<n>Intelligible and natural speech was generated for Sanskrit, Maharashtrian and Canara Konkani, Maithili and Kurukh.
arXiv Detail & Related papers (2025-06-04T12:22:24Z)
BhasaAnuvaad: A Speech Translation Dataset for 13 Indian Languages [27.273651323572786]
We evaluate the performance of widely-used Automatic Speech Translation systems on Indian languages. There is a striking absence of systems capable of accurately translating colloquial and informal language. We introduce BhasaAnuvaad, the largest publicly available dataset for AST involving 13 out of 22 scheduled Indian languages and English.
arXiv Detail & Related papers (2024-11-07T13:33:34Z)
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving [61.73180469072787]
We focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text. We present a new end-to-end model architecture COSTA that scaffolds on pretrained automatic speech recognition (ASR) and machine translation (MT) modules. COSTA significantly outperforms many competitive cascaded and end-to-end multimodal baselines by up to 3.5 BLEU points.
arXiv Detail & Related papers (2024-06-16T16:10:51Z)
Wav2Gloss: Generating Interlinear Glossed Text from Speech [78.64412090339044]
We propose Wav2Gloss, a task in which four linguistic annotation components are extracted automatically from speech. We provide various baselines to lay the groundwork for future research on Interlinear Glossed Text generation from speech.
arXiv Detail & Related papers (2024-03-19T21:45:29Z)
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages [17.862027695142825]
INDICVOICES is a dataset of natural and spontaneous speech from 16237 speakers covering 145 Indian districts and 22 languages. 1639 hours have already been transcribed, with a median of 73 hours per language. All the data, tools, guidelines, models and other materials developed as a part of this work will be made publicly available.
arXiv Detail & Related papers (2024-03-04T10:42:08Z)
Speech collage: code-switched audio generation by collaging monolingual corpora [50.356820349870986]
Speech Collage is a method that synthesizes CS data from monolingual corpora by splicing audio segments. We investigate the impact of generated data on speech recognition in two scenarios.
arXiv Detail & Related papers (2023-09-27T14:17:53Z)
PolyVoice: Language Models for Speech to Speech Translation [50.31000706309143]
PolyVoice is a language model-based framework for speech-to-speech translation (S2ST) We use discretized speech units, which are generated in a fully unsupervised way. For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model.
arXiv Detail & Related papers (2023-06-05T15:53:15Z)
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration [3.0122461286351796]
This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages. We specifically target the zero-shot learning scenario, where a TTS model trained using the data of one language is applied to synthesise speech for other, unseen languages. An end-to-end TTS system based on the Tacotron 2 architecture was trained using only the available data of the Kazakh language.
arXiv Detail & Related papers (2023-05-25T05:57:54Z)
Scaling Speech Technology to 1,000+ Languages [66.31120979098483]
The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task. Main ingredients are a new dataset based on readings of publicly available religious texts. We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, and a language identification model for 4,017 languages.
arXiv Detail & Related papers (2023-05-22T22:09:41Z)
DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set [0.0]
Common Label Set ( CLS) maps graphemes of various languages with similar sounds to common labels. Since Indian languages are mostly phonetic, building a transliteration to convert from native script to CLS is easy. We propose a novel architecture called Multilingual-Decoder-Decoder for building multilingual systems.
arXiv Detail & Related papers (2022-10-30T04:01:26Z)
Multilingual and code-switching ASR challenges for low resource Indian languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages. We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages. We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z)
MuRIL: Multilingual Representations for Indian Languages [3.529875637780551]
India is a multilingual society with 1369 rationalized languages and dialects being spoken across the country. Despite this, today's state-of-the-art multilingual systems perform suboptimally on Indian (IN) languages. We propose MuRIL, a multilingual language model specifically built for IN languages.
arXiv Detail & Related papers (2021-03-19T11:06:37Z)
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion [28.830575877307176]
It is not easy to obtain a bilingual corpus from a speaker who achieves native-level fluency in both languages. A Tacotron2-based cross-lingual voice conversion system is employed to generate the Mandarin speaker's English speech and the English speaker's Mandarin speech. The obtained bilingual data are then augmented with code-switched utterances synthesized using a Transformer model.
arXiv Detail & Related papers (2020-10-16T03:51:00Z)
Phonological Features for 0-shot Multilingual Speech Synthesis [50.591267188664666]
We show that code-switching is possible for languages unseen during training, even within monolingual models. We generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.
arXiv Detail & Related papers (2020-08-06T18:25:18Z)
A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture [0.0]
Existing Arabic speech synthesis solutions are slow, of low quality, and the naturalness of synthesized speech is inferior to the English synthesizers. This work describes how to generate high quality, natural, and human-like Arabic speech using an end-to-end neural deep network architecture.
arXiv Detail & Related papers (2020-07-22T17:03:18Z)
That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus [57.641761472372814]
CoVoST is a multilingual speech-to-text translation corpus from 11 languages into English. It diversified with over 11,000 speakers and over 60 accents. CoVoST is released under CC0 license and free to use.
arXiv Detail & Related papers (2020-02-04T14:35:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.