End-to-End Speech Translation for Code Switched Speech
- URL: http://arxiv.org/abs/2204.05076v1
- Date: Mon, 11 Apr 2022 13:25:30 GMT
- Title: End-to-End Speech Translation for Code Switched Speech
- Authors: Orion Weller, Matthias Sperber, Telmo Pires, Hendra Setiawan,
Christian Gollan, Dominic Telaar, Matthias Paulik
- Abstract summary: Code switching (CS) refers to the phenomenon of interchangeably using words and phrases from different languages.
We focus on CS in the context of English/Spanish conversations for the task of speech translation (ST), generating and evaluating both transcript and translation.
We show that our ST architectures, and especially our bidirectional end-to-end architecture, perform well on CS speech, even when no CS training data is used.
- Score: 13.97982457879585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code switching (CS) refers to the phenomenon of interchangeably using words
and phrases from different languages. CS can pose significant accuracy
challenges to NLP, due to the often monolingual nature of the underlying
systems. In this work, we focus on CS in the context of English/Spanish
conversations for the task of speech translation (ST), generating and
evaluating both transcript and translation. To evaluate model performance on
this task, we create a novel ST corpus derived from existing public data sets.
We explore various ST architectures across two dimensions: cascaded (transcribe
then translate) vs end-to-end (jointly transcribe and translate) and
unidirectional (source -> target) vs bidirectional (source <-> target). We show
that our ST architectures, and especially our bidirectional end-to-end
architecture, perform well on CS speech, even when no CS training data is used.
Related papers
- ConCSE: Unified Contrastive Learning and Augmentation for Code-Switched Embeddings [4.68732641979009]
This paper examines the Code-Switching (CS) phenomenon where two languages intertwine within a single utterance.
We highlight that the current Equivalence Constraint (EC) theory for CS in other languages may only partially capture English-Korean CS complexities.
We introduce a novel Koglish dataset tailored for English-Korean CS scenarios to mitigate such challenges.
arXiv Detail & Related papers (2024-08-28T11:27:21Z) - A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision [74.972172804514]
We introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in a joint embedding space between signed language and spoken language text.
New dataset annotations provide continuous sign-level annotations for six hours of test videos, and will be made publicly available.
Our model significantly outperforms the previous state of the art on both tasks.
arXiv Detail & Related papers (2024-05-16T17:19:06Z) - Towards Real-World Streaming Speech Translation for Code-Switched Speech [7.81154319203032]
Code-switching (CS) is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings.
We focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings and translation to a third language.
arXiv Detail & Related papers (2023-10-19T11:15:02Z) - Speech collage: code-switched audio generation by collaging monolingual
corpora [50.356820349870986]
Speech Collage is a method that synthesizes CS data from monolingual corpora by splicing audio segments.
We investigate the impact of generated data on speech recognition in two scenarios.
arXiv Detail & Related papers (2023-09-27T14:17:53Z) - The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit [61.52122386938913]
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit.
This paper describes the overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2.
arXiv Detail & Related papers (2023-04-10T14:05:22Z) - Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation [71.35243644890537]
End-to-end Speech Translation (ST) aims at translating the source language speech into target language text without generating the intermediate transcriptions.
Existing zero-shot methods fail to align the two modalities of speech and text into a shared semantic space.
We propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text.
arXiv Detail & Related papers (2022-10-18T03:06:47Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - KARI: KAnari/QCRI's End-to-End systems for the INTERSPEECH 2021 Indian
Languages Code-Switching Challenge [7.711092265101041]
We present the Kanari/QCRI system and the modeling strategies used to participate in the Interspeech 2021 Code-switching (CS) challenge for low-resource Indian languages.
The subtask involved developing a speech recognition system for two CS datasets: Hindi-English and Bengali-English.
To tackle the CS challenges, we use transfer learning for incorporating the publicly available monolingual Hindi, Bengali, and English speech data.
arXiv Detail & Related papers (2021-06-10T16:12:51Z) - Style Variation as a Vantage Point for Code-Switching [54.34370423151014]
Code-Switching (CS) is a common phenomenon observed in several bilingual and multilingual communities.
We present a novel vantage point of CS to be style variations between both the participating languages.
We propose a two-stage generative adversarial training approach where the first stage generates competitive negative examples for CS and the second stage generates more realistic CS sentences.
arXiv Detail & Related papers (2020-05-01T15:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.