Multilingual acoustic word embedding models for processing zero-resource
languages
- URL: http://arxiv.org/abs/2002.02109v2
- Date: Fri, 21 Feb 2020 14:19:56 GMT
- Title: Multilingual acoustic word embedding models for processing zero-resource
languages
- Authors: Herman Kamper, Yevgen Matusevych, Sharon Goldwater
- Abstract summary: We train a single supervised embedding model on labelled data from multiple well-resourced languages.
We then apply it to unseen zero-resource languages.
- Score: 37.78342106714364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acoustic word embeddings are fixed-dimensional representations of
variable-length speech segments. In settings where unlabelled speech is the
only available resource, such embeddings can be used in "zero-resource" speech
search, indexing and discovery systems. Here we propose to train a single
supervised embedding model on labelled data from multiple well-resourced
languages and then apply it to unseen zero-resource languages. For this
transfer learning approach, we consider two multilingual recurrent neural
network models: a discriminative classifier trained on the joint vocabularies
of all training languages, and a correspondence autoencoder trained to
reconstruct word pairs. We test these using a word discrimination task on six
target zero-resource languages. When trained on seven well-resourced languages,
both models perform similarly and outperform unsupervised models trained on the
zero-resource languages. With just a single training language, the second model
works better, but performance depends more on the particular training--testing
language pair.
Related papers
- Zero-shot Sentiment Analysis in Low-Resource Languages Using a
Multilingual Sentiment Lexicon [78.12363425794214]
We focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets.
We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets.
arXiv Detail & Related papers (2024-02-03T10:41:05Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Bitext Mining Using Distilled Sentence Representations for Low-Resource
Languages [12.00637655338665]
We study very low-resource languages and handle 50 African languages, many of which are not covered by any other model.
We train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
For these languages, we train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
arXiv Detail & Related papers (2022-05-25T10:53:24Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Multilingual transfer of acoustic word embeddings improves when training
on languages related to the target zero-resource language [32.170748231414365]
We show that training on even just a single related language gives the largest gain.
We also find that adding data from unrelated languages generally doesn't hurt performance.
arXiv Detail & Related papers (2021-06-24T08:37:05Z) - Acoustic word embeddings for zero-resource languages using
self-supervised contrastive learning and multilingual adaptation [30.669442499082443]
We consider how a contrastive learning loss can be used in both purely unsupervised and multilingual transfer settings.
We show that terms from an unsupervised term discovery system can be used for contrastive self-supervision.
We find that self-supervised contrastive adaptation outperforms adapted multilingual correspondence autoencoder and Siamese AWE models.
arXiv Detail & Related papers (2021-03-19T11:08:35Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Multilingual Jointly Trained Acoustic and Written Word Embeddings [22.63696520064212]
We extend this idea to multiple low-resource languages.
We jointly train an AWE model and an AGWE model, using phonetically transcribed data from multiple languages.
The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
arXiv Detail & Related papers (2020-06-24T19:16:02Z) - Improved acoustic word embeddings for zero-resource languages using
multilingual transfer [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages and apply it to unseen zero-resource languages.
We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.
All of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision.
arXiv Detail & Related papers (2020-06-02T12:28:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.