Related papers: Towards Real-World Streaming Speech Translation for Code-Switched Speech

Towards Real-World Streaming Speech Translation for Code-Switched Speech

URL: http://arxiv.org/abs/2310.12648v2
Date: Mon, 23 Oct 2023 11:47:53 GMT
Title: Towards Real-World Streaming Speech Translation for Code-Switched Speech
Authors: Belen Alastruey, Matthias Sperber, Christian Gollan, Dominic Telaar, Tim Ng, Aashish Agarwal
Abstract summary: Code-switching (CS) is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. We focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings and translation to a third language.
Score: 7.81154319203032
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\textit{monolingual transcription}). In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier.

Related papers

Multilingual Source Tracing of Speech Deepfakes: A First Benchmark [19.578741954970738]
This paper introduces the first benchmark for multilingual speech deepfake source tracing.<n>We comparatively investigate DSP- and SSL-based modeling, examine how SSL representations fine-tuned on different languages impact cross-lingual generalization performance.<n>Our findings offer the first comprehensive insights into the challenges of identifying speech generation models when training and inference languages differ.
arXiv Detail & Related papers (2025-08-06T07:11:36Z)
From TOWER to SPIRE: Adding the Speech Modality to a Translation-Specialist LLM [24.31773681590982]
We introduce Spire, a speech-augmented language model (LM) capable of both translating and transcribing speech input from English into 10 other languages.<n>Spire integrates the speech modality into an existing multilingual LM via speech discretization and continued pre-training using only 42.5K hours of speech.
arXiv Detail & Related papers (2025-03-13T17:57:32Z)
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving [61.73180469072787]
We focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text. We present a new end-to-end model architecture COSTA that scaffolds on pretrained automatic speech recognition (ASR) and machine translation (MT) modules. COSTA significantly outperforms many competitive cascaded and end-to-end multimodal baselines by up to 3.5 BLEU points.
arXiv Detail & Related papers (2024-06-16T16:10:51Z)
End-to-End Speech Translation of Arabic to English Broadcast News [2.375764121997739]
Speech translation (ST) is the task of translating acoustic speech signals in a source language into text in a foreign language. This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system.
arXiv Detail & Related papers (2022-12-11T11:35:46Z)
Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language. We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z)
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation [94.80029087828888]
Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST. Direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare. We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks.
arXiv Detail & Related papers (2022-10-31T02:55:51Z)
Code-Switching without Switching: Language Agnostic End-to-End Speech Translation [68.8204255655161]
We treat speech recognition and translation as one unified end-to-end speech translation problem. By training LAST with both input languages, we decode speech into one target language, regardless of the input language.
arXiv Detail & Related papers (2022-10-04T10:34:25Z)
End-to-End Speech Translation for Code Switched Speech [13.97982457879585]
Code switching (CS) refers to the phenomenon of interchangeably using words and phrases from different languages. We focus on CS in the context of English/Spanish conversations for the task of speech translation (ST), generating and evaluating both transcript and translation. We show that our ST architectures, and especially our bidirectional end-to-end architecture, perform well on CS speech, even when no CS training data is used.
arXiv Detail & Related papers (2022-04-11T13:25:30Z)
Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language. We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z)
Consecutive Decoding for Speech-to-text Translation [51.155661276936044]
COnSecutive Transcription and Translation (COSTT) is an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. Our method is verified on three mainstream datasets.
arXiv Detail & Related papers (2020-09-21T10:10:45Z)
Style Variation as a Vantage Point for Code-Switching [54.34370423151014]
Code-Switching (CS) is a common phenomenon observed in several bilingual and multilingual communities. We present a novel vantage point of CS to be style variations between both the participating languages. We propose a two-stage generative adversarial training approach where the first stage generates competitive negative examples for CS and the second stage generates more realistic CS sentences.
arXiv Detail & Related papers (2020-05-01T15:53:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.