Acoustic Word Embeddings for Untranscribed Target Languages with
Continued Pretraining and Learned Pooling
- URL: http://arxiv.org/abs/2306.02153v1
- Date: Sat, 3 Jun 2023 16:44:21 GMT
- Title: Acoustic Word Embeddings for Untranscribed Target Languages with
Continued Pretraining and Learned Pooling
- Authors: Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater
- Abstract summary: Acoustic word embeddings are created by training a pooling function using pairs of word-like units.
Mean-pooled representations from a self-supervised English model were suggested as a promising alternative, but their performance on target languages was not fully competitive.
We show that both approaches outperform a recent approach on word discrimination.
- Score: 28.758396218435635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acoustic word embeddings are typically created by training a pooling function
using pairs of word-like units. For unsupervised systems, these are mined using
k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled
representations from a pre-trained self-supervised English model were suggested
as a promising alternative, but their performance on target languages was not
fully competitive. Here, we explore improvements to both approaches: we use
continued pre-training to adapt the self-supervised model to the target
language, and we use a multilingual phone recognizer (MPR) to mine phone n-gram
pairs for training the pooling function. Evaluating on four languages, we show
that both methods outperform a recent approach on word discrimination.
Moreover, the MPR method is orders of magnitude faster than KNN, and is highly
data efficient. We also show a small improvement from performing learned
pooling on top of the continued pre-trained representations.
Related papers
- Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - Switch Point biased Self-Training: Re-purposing Pretrained Models for
Code-Switching [44.034300203700234]
Code-switching is a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities.
We propose a self training method to repurpose the existing pretrained models using a switch-point bias.
Our approach performs well on both tasks by reducing the gap between the switch point performance.
arXiv Detail & Related papers (2021-11-01T19:42:08Z) - Adversarial Training with Contrastive Learning in NLP [0.0]
We propose adversarial training with contrastive learning (ATCL) to adversarially train a language processing task.
The core idea is to make linear perturbations in the embedding space of the input via fast gradient methods (FGM) and train the model to keep the original and perturbed representations close via contrastive learning.
The results show not only an improvement in the quantitative (perplexity and BLEU) scores when compared to the baselines, but ATCL also achieves good qualitative results in the semantic level for both tasks.
arXiv Detail & Related papers (2021-09-19T07:23:45Z) - Multilingual Jointly Trained Acoustic and Written Word Embeddings [22.63696520064212]
We extend this idea to multiple low-resource languages.
We jointly train an AWE model and an AGWE model, using phonetically transcribed data from multiple languages.
The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
arXiv Detail & Related papers (2020-06-24T19:16:02Z) - Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns.
We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations.
We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.