Code-Switching without Switching: Language Agnostic End-to-End Speech
Translation
- URL: http://arxiv.org/abs/2210.01512v1
- Date: Tue, 4 Oct 2022 10:34:25 GMT
- Title: Code-Switching without Switching: Language Agnostic End-to-End Speech
Translation
- Authors: Christian Huber, Enes Yavuz Ugan and Alexander Waibel
- Abstract summary: We treat speech recognition and translation as one unified end-to-end speech translation problem.
By training LAST with both input languages, we decode speech into one target language, regardless of the input language.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose a) a Language Agnostic end-to-end Speech Translation model (LAST),
and b) a data augmentation strategy to increase code-switching (CS)
performance. With increasing globalization, multiple languages are increasingly
used interchangeably during fluent speech. Such CS complicates traditional
speech recognition and translation, as we must recognize which language was
spoken first and then apply a language-dependent recognizer and subsequent
translation component to generate the desired target language output. Such a
pipeline introduces latency and errors. In this paper, we eliminate the need
for that, by treating speech recognition and translation as one unified
end-to-end speech translation problem. By training LAST with both input
languages, we decode speech into one target language, regardless of the input
language. LAST delivers comparable recognition and speech translation accuracy
in monolingual usage, while reducing latency and error rate considerably when
CS is observed.
Related papers
- Gujarati-English Code-Switching Speech Recognition using ensemble
prediction of spoken language [29.058108207186816]
We propose two methods of introducing language specific parameters and explainability in the multi-head attention mechanism.
Despite being unable to reduce WER significantly, our method shows promise in predicting the correct language from just spoken data.
arXiv Detail & Related papers (2024-03-12T18:21:20Z) - TranSentence: Speech-to-speech Translation via Language-agnostic
Sentence-level Speech Encoding without Language-parallel Data [44.83532231917504]
TranSentence is a novel speech-to-speech translation without language-parallel speech data.
We train our model to generate speech based on the encoded embedding obtained from a language-agnostic sentence-level speech encoder.
We extend TranSentence to multilingual speech-to-speech translation.
arXiv Detail & Related papers (2024-01-17T11:52:40Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models [55.82292352607321]
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
arXiv Detail & Related papers (2021-10-07T14:43:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.