Related papers: Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language

URL: http://arxiv.org/abs/2106.12834v1
Date: Thu, 24 Jun 2021 08:37:05 GMT
Title: Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language
Authors: Christiaan Jacobs and Herman Kamper
Abstract summary: We show that training on even just a single related language gives the largest gain. We also find that adding data from unrelated languages generally doesn't hurt performance.
Score: 32.170748231414365
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Acoustic word embedding models map variable duration speech segments to fixed dimensional vectors, enabling efficient speech search and discovery. Previous work explored how embeddings can be obtained in zero-resource settings where no labelled data is available in the target language. The current best approach uses transfer learning: a single supervised multilingual model is trained using labelled data from multiple well-resourced languages and then applied to a target zero-resource language (without fine-tuning). However, it is still unclear how the specific choice of training languages affect downstream performance. Concretely, here we ask whether it is beneficial to use training languages related to the target. Using data from eleven languages spoken in Southern Africa, we experiment with adding data from different language families while controlling for the amount of data per language. In word discrimination and query-by-example search evaluations, we show that training on languages from the same family gives large improvements. Through finer-grained analysis, we show that training on even just a single related language gives the largest gain. We also find that adding data from unrelated languages generally doesn't hurt performance.

Related papers

Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data [59.30098850050971]
Cross-lingual transfer learning can improve performance on tasks with limited labeled data.<n>We leverage nearest-neighbor retrieval to augment minimal labeled data in the target language.<n>We evaluate our approach on eight languages and demonstrate that it consistently outperforms models trained solely on the target language data.
arXiv Detail & Related papers (2025-05-20T12:25:33Z)
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon [78.12363425794214]
We focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets. We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets.
arXiv Detail & Related papers (2024-02-03T10:41:05Z)
GradSim: Gradient-Based Language Grouping for Effective Multilingual Training [13.730907708289331]
We propose GradSim, a language grouping method based on gradient similarity. Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains. Besides linguistic features, the topics of the datasets play an important role for language grouping.
arXiv Detail & Related papers (2023-10-23T18:13:37Z)
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge [57.38948190611797]
This paper proposes a novel lip reading framework, especially for low-resource languages. Since low-resource languages do not have enough video-text paired data to train the model, it is regarded as challenging to develop lip reading models for low-resource languages.
arXiv Detail & Related papers (2023-08-18T05:19:03Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)
Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years. We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data. Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z)
Adapting Multilingual Speech Representation Model for a New, Underresourced Language through Multilingual Fine-tuning and Continued Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language. Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language. We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z)
Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks. Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive. We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z)
Multilingual Jointly Trained Acoustic and Written Word Embeddings [22.63696520064212]
We extend this idea to multiple low-resource languages. We jointly train an AWE model and an AGWE model, using phonetically transcribed data from multiple languages. The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
arXiv Detail & Related papers (2020-06-24T19:16:02Z)
Improved acoustic word embeddings for zero-resource languages using multilingual transfer [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages and apply it to unseen zero-resource languages. We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs. All of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision.
arXiv Detail & Related papers (2020-06-02T12:28:34Z)
That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
Multilingual acoustic word embedding models for processing zero-resource languages [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages. We then apply it to unseen zero-resource languages.
arXiv Detail & Related papers (2020-02-06T05:53:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.