Related papers: Code-switched inspired losses for generic spoken dialog representations

Code-switched inspired losses for generic spoken dialog representations

URL: http://arxiv.org/abs/2108.12465v1
Date: Fri, 27 Aug 2021 19:15:00 GMT
Title: Code-switched inspired losses for generic spoken dialog representations
Authors: Emile Chapuis, Pierre Colombo, Matthieu Labeau, Chloe Clave
Abstract summary: We introduce new pretraining losses tailored to learn multilingual spoken dialog representations. The goal of these losses is to expose the model to code-switched language. Our experiments show that our new code switched-inspired losses achieve a better performance in both monolingual and multilingual settings.
Score: 5.73995120847626
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spoken dialog systems need to be able to handle both multiple languages and multilinguality inside a conversation (\textit{e.g} in case of code-switching). In this work, we introduce new pretraining losses tailored to learn multilingual spoken dialog representations. The goal of these losses is to expose the model to code-switched language. To scale up training, we automatically build a pretraining corpus composed of multilingual conversations in five different languages (French, Italian, English, German and Spanish) from \texttt{OpenSubtitles}, a huge multilingual corpus composed of 24.3G tokens. We test the generic representations on \texttt{MIAM}, a new benchmark composed of five dialog act corpora on the same aforementioned languages as well as on two novel multilingual downstream tasks (\textit{i.e} multilingual mask utterance retrieval and multilingual inconsistency identification). Our experiments show that our new code switched-inspired losses achieve a better performance in both monolingual and multilingual settings.

Related papers

A two-stage transliteration approach to improve performance of a multilingual ASR [1.9511556030544333]
This paper presents an approach to build a language-agnostic end-to-end model trained on a grapheme set. We performed experiments with an end-to-end multilingual speech recognition system for two Indic languages.
arXiv Detail & Related papers (2024-10-09T05:30:33Z)
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment [50.80949663719335]
Training for cross-lingual alignment of sentence embeddings distorts the optimal monolingual structure of semantic spaces of individual languages. We train language-specific sentence encoders to avoid negative interference between languages. We then align all non-English monolingual encoders to the English encoder by training a cross-lingual alignment adapter on top of each.
arXiv Detail & Related papers (2024-07-20T13:56:39Z)
Parrot: Multilingual Visual Instruction Tuning [66.65963606552839]
Existing methods typically align vision encoders with Multimodal Large Language Models (MLLMs) via supervised fine-tuning (SFT)<n>We propose PARROT, a novel approach that leverages textual guidance for visual token alignment at the language level.<n>We introduce the Massive Multilingual Multimodal Benchmark (MMMB), a new benchmark comprising 6 languages, 15 categories, and 12,000 questions.
arXiv Detail & Related papers (2024-06-04T17:56:28Z)
Fine-Tuned Self-Supervised Speech Representations for Language Diarization in Multilingual Code-Switched Speech [4.39549503760707]
We develop a continuous multilingual language diarizer using fine-tuned speech representations extracted from a large self-supervised architecture (WavLM) We experiment with a code-switched corpus consisting of five South African languages (isiZulu, isiXa, Setswana, Sesotho and English)
arXiv Detail & Related papers (2023-12-15T09:40:41Z)
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information. Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z)
Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis. We cluster all the target languages into multiple groups and name each group as a representation sprachbund. Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z)
Challenges and Considerations with Code-Mixed NLP for Multilingual Societies [1.6675267471157407]
This paper discusses the current state of the NLP research, limitations, and foreseeable pitfalls in addressing five real-world applications for social good. We also propose futuristic datasets, models, and tools that can significantly advance the current research in multilingual NLP applications for the societal good.
arXiv Detail & Related papers (2021-06-15T00:53:55Z)
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z)
Phonological Features for 0-shot Multilingual Speech Synthesis [50.591267188664666]
We show that code-switching is possible for languages unseen during training, even within monolingual models. We generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.
arXiv Detail & Related papers (2020-08-06T18:25:18Z)
Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario [10.779568857641928]
This paper presents an extension on Tacotron2 to achieve bilingual multispeaker speech synthesis. We achieve cross-lingual synthesis, including code-switching cases, between English and Mandarin for monolingual speakers.
arXiv Detail & Related papers (2020-05-21T03:03:34Z)
Learning to Scale Multilingual Representations for Vision-Language Tasks [51.27839182889422]
The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4% with less than 1/5th the training parameters compared to other word embedding methods.
arXiv Detail & Related papers (2020-04-09T01:03:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.