Vector-Quantized Autoregressive Predictive Coding
- URL: http://arxiv.org/abs/2005.08392v1
- Date: Sun, 17 May 2020 23:06:09 GMT
- Title: Vector-Quantized Autoregressive Predictive Coding
- Authors: Yu-An Chung, Hao Tang, James Glass
- Abstract summary: We propose Vector-Quantized Autoregressive Predictive Coding (VQ-APC), a novel model that produces quantized representations.
By studying a sequence of increasingly limited models, we reveal the constituents of the learned representations.
We find that there exists a point where phonetic and speaker information are amplified to maximize a self-supervised objective.
- Score: 31.4011465698136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoregressive Predictive Coding (APC), as a self-supervised objective, has
enjoyed success in learning representations from large amounts of unlabeled
data, and the learned representations are rich for many downstream tasks.
However, the connection between low self-supervised loss and strong performance
in downstream tasks remains unclear. In this work, we propose Vector-Quantized
Autoregressive Predictive Coding (VQ-APC), a novel model that produces
quantized representations, allowing us to explicitly control the amount of
information encoded in the representations. By studying a sequence of
increasingly limited models, we reveal the constituents of the learned
representations. In particular, we confirm the presence of information with
probing tasks, while showing the absence of information with mutual
information, uncovering the model's preference in preserving speech information
as its capacity becomes constrained. We find that there exists a point where
phonetic and speaker information are amplified to maximize a self-supervised
objective. As a byproduct, the learned codes for a particular model capacity
correspond well to English phones.
Related papers
- MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion [14.907473847787541]
We propose Masked Diffusion Conditional (MacDiff) as a unified framework for human skeleton modeling.
For the first time, we leverage diffusion models as effective skeleton representation learners.
MacDiff achieves state-of-the-art performance on representation learning benchmarks while maintaining the competence for generative tasks.
arXiv Detail & Related papers (2024-09-16T17:06:10Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - Ignorance is Bliss: Robust Control via Information Gating [60.17644038829572]
Informational parsimony provides a useful inductive bias for learning representations that achieve better generalization by being robust to noise and spurious correlations.
We propose textitinformation gating as a way to learn parsimonious representations that identify the minimal information required for a task.
arXiv Detail & Related papers (2023-03-10T18:31:50Z) - Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant.
One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning.
Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks.
We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z) - High Fidelity Visualization of What Your Self-Supervised Representation
Knows About [22.982471878833362]
In this work, we showcase the use of a conditional diffusion based generative model (RCDM) to visualize representations learned with self-supervised models.
We demonstrate how this model's generation quality is on par with state-of-the-art generative models while being faithful to the representation used as conditioning.
arXiv Detail & Related papers (2021-12-16T19:23:33Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - Conditional Contrastive Learning: Removing Undesirable Information in
Self-Supervised Representations [108.29288034509305]
We develop conditional contrastive learning to remove undesirable information in self-supervised representations.
We demonstrate empirically that our methods can successfully learn self-supervised representations for downstream tasks.
arXiv Detail & Related papers (2021-06-05T10:51:26Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - High-Fidelity Audio Generation and Representation Learning with Guided
Adversarial Autoencoder [2.6770746621108654]
We propose a new autoencoder based model named "Guided Adversarial Autoencoder (GAAE)"
Our proposed model can generate audio with superior quality, which is indistinguishable from the real audio samples.
arXiv Detail & Related papers (2020-06-01T12:19:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.