A Correspondence Variational Autoencoder for Unsupervised Acoustic Word
Embeddings
- URL: http://arxiv.org/abs/2012.02221v1
- Date: Thu, 3 Dec 2020 19:24:42 GMT
- Title: A Correspondence Variational Autoencoder for Unsupervised Acoustic Word
Embeddings
- Authors: Puyuan Peng, Herman Kamper, Karen Livescu
- Abstract summary: We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.
The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages.
- Score: 50.524054820564395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new unsupervised model for mapping a variable-duration speech
segment to a fixed-dimensional representation. The resulting acoustic word
embeddings can form the basis of search, discovery, and indexing systems for
low- and zero-resource languages. Our model, which we refer to as a maximal
sampling correspondence variational autoencoder (MCVAE), is a recurrent neural
network (RNN) trained with a novel self-supervised correspondence loss that
encourages consistency between embeddings of different instances of the same
word. Our training scheme improves on previous correspondence training
approaches through the use and comparison of multiple samples from the
approximate posterior distribution. In the zero-resource setting, the MCVAE can
be trained in an unsupervised way, without any ground-truth word pairs, by
using the word-like segments discovered via an unsupervised term discovery
system. In both this setting and a semi-supervised low-resource setting (with a
limited set of ground-truth word pairs), the MCVAE outperforms previous
state-of-the-art models, such as Siamese-, CAE- and VAE-based RNNs.
Related papers
- Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic
Lexical Resources [11.257738983764499]
Supervised models for Word Sense Disambiguation (WSD) currently yield to state-of-the-art results in the most popular benchmarks.
We enhance "modern" supervised WSD models exploiting two popular SLRs: WordNet and WordNet Domains.
We study the effect of different types of semantic features, investigating their interaction with local contexts encoded by means of mixtures of Word Embeddings or Recurrent Neural Networks.
arXiv Detail & Related papers (2024-02-20T13:47:51Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - Language as a Latent Sequence: deep latent variable models for
semi-supervised paraphrase generation [47.33223015862104]
We present a novel unsupervised model named variational sequence auto-encoding reconstruction (VSAR), which performs latent sequence inference given an observed text.
To leverage information from text pairs, we additionally introduce a novel supervised model we call dual directional learning (DDL), which is designed to integrate with our proposed VSAR model.
Our empirical evaluations suggest that the combined model yields competitive performance against the state-of-the-art supervised baselines on complete data.
arXiv Detail & Related papers (2023-01-05T19:35:30Z) - Unsupervised Syntactically Controlled Paraphrase Generation with
Abstract Meaning Representations [59.10748929158525]
Abstract Representations (AMR) can greatly improve the performance of unsupervised syntactically controlled paraphrase generation.
Our proposed model, AMR-enhanced Paraphrase Generator (AMRPG), encodes the AMR graph and the constituency parses the input sentence into two disentangled semantic and syntactic embeddings.
Experiments show that AMRPG generates more accurate syntactically controlled paraphrases, both quantitatively and qualitatively, compared to the existing unsupervised approaches.
arXiv Detail & Related papers (2022-11-02T04:58:38Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Self-Supervised Contrastive Learning for Unsupervised Phoneme
Segmentation [37.054709598792165]
The model is a convolutional neural network that operates directly on the raw waveform.
It is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle.
At test time, a peak detection algorithm is applied over the model outputs to produce the final boundaries.
arXiv Detail & Related papers (2020-07-27T12:10:21Z) - A Convolutional Deep Markov Model for Unsupervised Speech Representation
Learning [32.59760685342343]
Probabilistic Latent Variable Models provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.
In this work, we propose ConvDMM, a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks.
When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods.
arXiv Detail & Related papers (2020-06-03T21:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.