MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech
Recognition
- URL: http://arxiv.org/abs/2302.13750v1
- Date: Mon, 27 Feb 2023 13:26:17 GMT
- Title: MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech
Recognition
- Authors: Yoohwan Kwon and Soo-Whan Chung
- Abstract summary: We present a multi-lingual speech recognition network named Mixture-of-Language-Expert(MoLE)
MoLE analyzes linguistic expression from input speech in arbitrary languages, activating a language-specific expert with a lightweight language tokenizer.
Based on the reliability, the activated expert and the language-agnostic expert are aggregated to represent language-conditioned embedding.
- Score: 12.23416994447554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-lingual speech recognition aims to distinguish linguistic expressions
in different languages and integrate acoustic processing simultaneously. In
contrast, current multi-lingual speech recognition research follows a
language-aware paradigm, mainly targeted to improve recognition performance
rather than discriminate language characteristics. In this paper, we present a
multi-lingual speech recognition network named
Mixture-of-Language-Expert(MoLE), which digests speech in a variety of
languages. Specifically, MoLE analyzes linguistic expression from input speech
in arbitrary languages, activating a language-specific expert with a
lightweight language tokenizer. The tokenizer not only activates experts, but
also estimates the reliability of the activation. Based on the reliability, the
activated expert and the language-agnostic expert are aggregated to represent
language-conditioned embedding for efficient speech recognition. Our proposed
model is evaluated in 5 languages scenario, and the experimental results show
that our structure is advantageous on multi-lingual recognition, especially for
speech in low-resource language.
Related papers
- Fine-Tuned Self-Supervised Speech Representations for Language
Diarization in Multilingual Code-Switched Speech [4.39549503760707]
We develop a continuous multilingual language diarizer using fine-tuned speech representations extracted from a large self-supervised architecture (WavLM)
We experiment with a code-switched corpus consisting of five South African languages (isiZulu, isiXa, Setswana, Sesotho and English)
arXiv Detail & Related papers (2023-12-15T09:40:41Z) - Multilingual Multi-Figurative Language Detection [14.799109368073548]
figurative language understanding is highly understudied in a multilingual setting.
We introduce multilingual multi-figurative language modelling, and provide a benchmark for sentence-level figurative language detection.
We develop a framework for figurative language detection based on template-based prompt learning.
arXiv Detail & Related papers (2023-05-31T18:52:41Z) - Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and
Neural Architecture Search [15.51730246937201]
Speech emotion recognition (SER) classifies audio into emotion categories such as Happy, Angry, Fear, Disgust and Neutral.
This paper proposes a language-specific model that extract emotional information from multiple pre-trained speech models.
Our model raises the state-of-the-art accuracy by 3% for German and 14.3% for French.
arXiv Detail & Related papers (2022-10-31T19:55:33Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Automatic Spoken Language Identification using a Time-Delay Neural
Network [0.0]
A language identification system was built to distinguish between Arabic, Spanish, French, and Turkish.
A pre-existing multilingual dataset was used to train a series of acoustic models.
The system was provided with a custom multilingual language model and a specialized pronunciation lexicon.
arXiv Detail & Related papers (2022-05-19T13:47:48Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.