Acoustic word embeddings for zero-resource languages using
self-supervised contrastive learning and multilingual adaptation
- URL: http://arxiv.org/abs/2103.10731v1
- Date: Fri, 19 Mar 2021 11:08:35 GMT
- Title: Acoustic word embeddings for zero-resource languages using
self-supervised contrastive learning and multilingual adaptation
- Authors: Christiaan Jacobs, Yevgen Matusevych, Herman Kamper
- Abstract summary: We consider how a contrastive learning loss can be used in both purely unsupervised and multilingual transfer settings.
We show that terms from an unsupervised term discovery system can be used for contrastive self-supervision.
We find that self-supervised contrastive adaptation outperforms adapted multilingual correspondence autoencoder and Siamese AWE models.
- Score: 30.669442499082443
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Acoustic word embeddings (AWEs) are fixed-dimensional representations of
variable-length speech segments. For zero-resource languages where labelled
data is not available, one AWE approach is to use unsupervised
autoencoder-based recurrent models. Another recent approach is to use
multilingual transfer: a supervised AWE model is trained on several
well-resourced languages and then applied to an unseen zero-resource language.
We consider how a recent contrastive learning loss can be used in both the
purely unsupervised and multilingual transfer settings. Firstly, we show that
terms from an unsupervised term discovery system can be used for contrastive
self-supervision, resulting in improvements over previous unsupervised
monolingual AWE models. Secondly, we consider how multilingual AWE models can
be adapted to a specific zero-resource language using discovered terms. We find
that self-supervised contrastive adaptation outperforms adapted multilingual
correspondence autoencoder and Siamese AWE models, giving the best overall
results in a word discrimination task on six zero-resource languages.
Related papers
- Multilingual acoustic word embeddings for zero-resource languages [1.5229257192293204]
It specifically uses acoustic word embedding (AWE) -- fixed-dimensional representations of variable-duration speech segments.
The study introduces a new neural network that outperforms existing AWE models on zero-resource languages.
AWEs are applied to a keyword-spotting system for hate speech detection in Swahili radio broadcasts.
arXiv Detail & Related papers (2024-01-19T08:02:37Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and
Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure.
We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z) - A Survey of Multilingual Models for Automatic Speech Recognition [6.657361001202456]
Cross-lingual transfer is an attractive solution to the problem of multilingual Automatic Speech Recognition.
Recent advances in Self Supervised Learning are opening up avenues for unlabeled speech data to be used in multilingual ASR models.
We present best practices for building multilingual models from research across diverse languages and techniques.
arXiv Detail & Related papers (2022-02-25T09:31:40Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Multilingual Jointly Trained Acoustic and Written Word Embeddings [22.63696520064212]
We extend this idea to multiple low-resource languages.
We jointly train an AWE model and an AGWE model, using phonetically transcribed data from multiple languages.
The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
arXiv Detail & Related papers (2020-06-24T19:16:02Z) - Improved acoustic word embeddings for zero-resource languages using
multilingual transfer [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages and apply it to unseen zero-resource languages.
We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.
All of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision.
arXiv Detail & Related papers (2020-06-02T12:28:34Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - Multilingual acoustic word embedding models for processing zero-resource
languages [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages.
We then apply it to unseen zero-resource languages.
arXiv Detail & Related papers (2020-02-06T05:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.