Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
- URL: http://arxiv.org/abs/2406.10276v1
- Date: Wed, 12 Jun 2024 00:00:39 GMT
- Title: Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
- Authors: Peidong Wang, Jian Xue, Jinyu Li, Junkun Chen, Aswin Shanmugam Subramanian,
- Abstract summary: Many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language.
In some cases, the input language can be given or estimated.
We accomplish this by introducing a simple and effective linear input network.
- Score: 40.0365339798752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In some cases, the input language can be given or estimated. Our goal is to use this additional language information while preserving the quality of the other languages. We accomplish this by introducing a simple and effective linear input network. The linear input network is initialized as an identity matrix, which ensures that the model can perform as well as, or better than, the original model. Experimental results show that the proposed method can successfully enhance the specified language, while keeping the language-agnostic ability of the many-to-one ST models.
Related papers
- Streaming Bilingual End-to-End ASR model using Attention over Multiple
Softmax [6.386371634323785]
We propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages.
The proposed model has shared encoder and prediction networks, with language-specific joint networks that are combined via a self-attention mechanism.
arXiv Detail & Related papers (2024-01-22T01:44:42Z) - Robust Open-Set Spoken Language Identification and the CU MultiLang
Dataset [2.048226951354646]
Open-set spoken language identification systems can detect when an input exhibits none of the original languages.
We implement a novel approach to open-set spoken language identification that uses MFCC and pitch features.
We present a spoken language identification system that achieves 91.76% accuracy on trained languages and has the capability to adapt to unknown languages on the fly.
arXiv Detail & Related papers (2023-08-29T00:44:27Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Adapting Multilingual Speech Representation Model for a New,
Underresourced Language through Multilingual Fine-tuning and Continued
Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language.
Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language.
We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - A language score based output selection method for multilingual speech
recognition [2.294014185517203]
A language model rescoring method is applied to produce all possible candidates for target languages.
A simple score is proposed to automatically select the output without any identifier model or language specification of the input language.
In addition, we present to design an English and Vietnamese End-to-End model to deal with not only the problem of cross-lingual speakers but also as a solution to improve the accuracy of borrowed words of English in Vietnamese.
arXiv Detail & Related papers (2020-05-02T15:07:14Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.