Enabling Interactive Transcription in an Indigenous Community
- URL: http://arxiv.org/abs/2011.06198v1
- Date: Thu, 12 Nov 2020 04:41:35 GMT
- Title: Enabling Interactive Transcription in an Indigenous Community
- Authors: \'Eric Le Ferrand, Steven Bird, Laurent Besacier
- Abstract summary: We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop.
We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR system, it is possible to take advantage of the transcription of a small number of isolated words.
- Score: 23.53585157238112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel transcription workflow which combines spoken term
detection and human-in-the-loop, together with a pilot experiment. This work is
grounded in an almost zero-resource scenario where only a few terms have so far
been identified, involving two endangered languages. We show that in the early
stages of transcription, when the available data is insufficient to train a
robust ASR system, it is possible to take advantage of the transcription of a
small number of isolated words in order to bootstrap the transcription of a
speech collection.
Related papers
- Automated Tone Transcription and Clustering with Tone2Vec [31.562430412564577]
We introduce pitch-based similarity representations for tone transcription, named Tone2Vec.
Experiments on dialect clustering and variance show that Tone2Vec effectively captures fine-grained tone variation.
These algorithms are integrated into an open-sourced and easy-to-use package, ToneLab.
arXiv Detail & Related papers (2024-10-03T09:18:54Z) - Leveraging Timestamp Information for Serialized Joint Streaming
Recognition and Translation [51.399695200838586]
We propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder.
Experiments on it,es,de->en prove the effectiveness of our approach, enabling the generation of one-to-many joint outputs with a single decoder for the first time.
arXiv Detail & Related papers (2023-10-23T11:00:27Z) - Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer.
Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z) - On the Copying Problem of Unsupervised NMT: A Training Schedule with a
Language Discriminator Loss [120.19360680963152]
unsupervised neural machine translation (UNMT) has achieved success in many language pairs.
The copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs.
We propose a simple but effective training schedule that incorporates a language discriminator loss.
arXiv Detail & Related papers (2023-05-26T18:14:23Z) - Back Translation for Speech-to-text Translation Without Transcripts [11.13240570688547]
We develop a back translation algorithm for ST (BT4ST) to synthesize pseudo ST data from monolingual target data.
To ease the challenges posed by short-to-long generation and one-to-many mapping, we introduce self-supervised discrete units.
With our synthetic ST data, we achieve an average boost of 2.3 BLEU on MuST-C En-De, En-Fr, and En-Es datasets.
arXiv Detail & Related papers (2023-05-15T15:12:40Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - End-to-End Rich Transcription-Style Automatic Speech Recognition with
Semi-Supervised Learning [28.516240952627076]
We propose a semi-supervised learning method for building end-to-end rich transcription-style automatic speech recognition (RT-ASR) systems.
The Key process in our learning is to convert the common transcription-style dataset into a pseudo-rich transcription-style dataset.
Our experiments on spontaneous ASR tasks showed the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-07-07T12:52:49Z) - Spoken Term Detection Methods for Sparse Transcription in Very
Low-resource Settings [20.410074074340447]
Experiments on two oral languages show that a pretrained universal phone recognizer, fine-tuned with only a few minutes of target language speech, can be used for spoken term detection.
We show that representing phoneme recognition ambiguity in a graph structure can further boost the recall while maintaining high precision in the low resource spoken term detection task.
arXiv Detail & Related papers (2021-06-11T04:09:54Z) - Textual Supervision for Visually Grounded Spoken Language Understanding [51.93744335044475]
Visually-grounded models of spoken language understanding extract semantic information directly from speech.
This is useful for low-resource languages, where transcriptions can be expensive or impossible to obtain.
Recent work showed that these models can be improved if transcriptions are available at training time.
arXiv Detail & Related papers (2020-10-06T15:16:23Z) - Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis
Using Discrete Speech Representation [125.59372403631006]
We propose a semi-supervised learning approach for multi-speaker text-to-speech (TTS)
A multi-speaker TTS model can learn from the untranscribed audio via the proposed encoder-decoder framework with discrete speech representation.
We found the model can benefit from the proposed semi-supervised learning approach even when part of the unpaired speech data is noisy.
arXiv Detail & Related papers (2020-05-16T15:47:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.