A Convolutional Deep Markov Model for Unsupervised Speech Representation
Learning
- URL: http://arxiv.org/abs/2006.02547v2
- Date: Tue, 8 Sep 2020 14:09:58 GMT
- Title: A Convolutional Deep Markov Model for Unsupervised Speech Representation
Learning
- Authors: Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian
Lancucki, Ricard Marxer, James Glass
- Abstract summary: Probabilistic Latent Variable Models provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.
In this work, we propose ConvDMM, a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks.
When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods.
- Score: 32.59760685342343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Probabilistic Latent Variable Models (LVMs) provide an alternative to
self-supervised learning approaches for linguistic representation learning from
speech. LVMs admit an intuitive probabilistic interpretation where the latent
structure shapes the information extracted from the signal. Even though LVMs
have recently seen a renewed interest due to the introduction of Variational
Autoencoders (VAEs), their use for speech representation learning remains
largely unexplored. In this work, we propose Convolutional Deep Markov Model
(ConvDMM), a Gaussian state-space model with non-linear emission and transition
functions modelled by deep neural networks. This unsupervised model is trained
using black box variational inference. A deep convolutional neural network is
used as an inference network for structured variational approximation. When
trained on a large scale speech dataset (LibriSpeech), ConvDMM produces
features that significantly outperform multiple self-supervised feature
extracting methods on linear phone classification and recognition on the Wall
Street Journal dataset. Furthermore, we found that ConvDMM complements
self-supervised methods like Wav2Vec and PASE, improving on the results
achieved with any of the methods alone. Lastly, we find that ConvDMM features
enable learning better phone recognizers than any other features in an extreme
low-resource regime with few labeled training examples.
Related papers
- Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - LlaMaVAE: Guiding Large Language Model Generation via Continuous Latent
Sentence Spaces [1.529963465178546]
We present LlaMaVAE, which combines expressive encoder and decoder models (sentenceT5 and LlaMA) with a VAE architecture to provide better text generation control to large language models (LLMs)
Experimental results reveal that LlaMaVAE can outperform the previous state-of-the-art VAE language model, Optimus, across various tasks.
arXiv Detail & Related papers (2023-12-20T17:25:23Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Distributionally Robust Recurrent Decoders with Random Network
Distillation [93.10261573696788]
We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to disregard OOD context during inference.
We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.
arXiv Detail & Related papers (2021-10-25T19:26:29Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Fully differentiable model discovery [0.0]
We propose an approach by combining neural network based surrogates with Sparse Bayesian Learning.
Our work expands PINNs to various types of neural network architectures, and connects neural network-based surrogates to the rich field of Bayesian parameter inference.
arXiv Detail & Related papers (2021-06-09T08:11:23Z) - A Correspondence Variational Autoencoder for Unsupervised Acoustic Word
Embeddings [50.524054820564395]
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.
The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages.
arXiv Detail & Related papers (2020-12-03T19:24:42Z) - Causality-aware counterfactual confounding adjustment for feature
representations learned by deep models [14.554818659491644]
Causal modeling has been recognized as a potential solution to many challenging problems in machine learning (ML)
We describe how a recently proposed counterfactual approach can still be used to deconfound the feature representations learned by deep neural network (DNN) models.
arXiv Detail & Related papers (2020-04-20T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.