Fine-Tuned Self-Supervised Speech Representations for Language
Diarization in Multilingual Code-Switched Speech
- URL: http://arxiv.org/abs/2312.09645v1
- Date: Fri, 15 Dec 2023 09:40:41 GMT
- Title: Fine-Tuned Self-Supervised Speech Representations for Language
Diarization in Multilingual Code-Switched Speech
- Authors: Geoffrey Frost, Emily Morris, Joshua Jansen van V\"uren, Thomas
Niesler
- Abstract summary: We develop a continuous multilingual language diarizer using fine-tuned speech representations extracted from a large self-supervised architecture (WavLM)
We experiment with a code-switched corpus consisting of five South African languages (isiZulu, isiXa, Setswana, Sesotho and English)
- Score: 4.39549503760707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Annotating a multilingual code-switched corpus is a painstaking process
requiring specialist linguistic expertise. This is partly due to the large
number of language combinations that may appear within and across utterances,
which might require several annotators with different linguistic expertise to
consider an utterance sequentially. This is time-consuming and costly. It would
be useful if the spoken languages in an utterance and the boundaries thereof
were known before annotation commences, to allow segments to be assigned to the
relevant language experts in parallel. To address this, we investigate the
development of a continuous multilingual language diarizer using fine-tuned
speech representations extracted from a large pre-trained self-supervised
architecture (WavLM). We experiment with a code-switched corpus consisting of
five South African languages (isiZulu, isiXhosa, Setswana, Sesotho and English)
and show substantial diarization error rate improvements for language families,
language groups, and individual languages over baseline systems.
Related papers
- Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment [50.80949663719335]
Training for cross-lingual alignment of sentence embeddings distorts the optimal monolingual structure of semantic spaces of individual languages.
We train language-specific sentence encoders to avoid negative interference between languages.
We then align all non-English monolingual encoders to the English encoder by training a cross-lingual alignment adapter on top of each.
arXiv Detail & Related papers (2024-07-20T13:56:39Z) - Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - Language Model Tokenizers Introduce Unfairness Between Languages [98.92630681729518]
We show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked.
Character-level and byte-level models also exhibit over 4 times the difference in the encoding length for some language pairs.
We make the case that we should train future language models using multilingually fair subword tokenizers.
arXiv Detail & Related papers (2023-05-17T14:17:57Z) - MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech
Recognition [12.23416994447554]
We present a multi-lingual speech recognition network named Mixture-of-Language-Expert(MoLE)
MoLE analyzes linguistic expression from input speech in arbitrary languages, activating a language-specific expert with a lightweight language tokenizer.
Based on the reliability, the activated expert and the language-agnostic expert are aggregated to represent language-conditioned embedding.
arXiv Detail & Related papers (2023-02-27T13:26:17Z) - Massively Multilingual Lexical Specialization of Multilingual
Transformers [18.766379322798837]
We show that massively multilingual lexical specialization brings substantial gains in two standard cross-lingual lexical tasks.
We observe gains for languages unseen in specialization, indicating that multilingual lexical specialization enables generalization to languages with no lexical constraints.
arXiv Detail & Related papers (2022-08-01T17:47:03Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Phylogeny-Inspired Adaptation of Multilingual Models to New Languages [43.62238334380897]
We show how we can use language phylogenetic information to improve cross-lingual transfer leveraging closely related languages.
We perform adapter-based training on languages from diverse language families (Germanic, Uralic, Tupian, Uto-Aztecan) and evaluate on both syntactic and semantic tasks.
arXiv Detail & Related papers (2022-05-19T15:49:19Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.