Autoregressive Co-Training for Learning Discrete Speech Representations
- URL: http://arxiv.org/abs/2203.15840v1
- Date: Tue, 29 Mar 2022 18:17:18 GMT
- Title: Autoregressive Co-Training for Learning Discrete Speech Representations
- Authors: Sung-Lin Yeh, Hao Tang
- Abstract summary: We consider a generative model with discrete latent variables that learns a discrete representation for speech.
We find that the proposed approach learns discrete representation that is highly correlated with phonetic units.
- Score: 19.400428010647573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While several self-supervised approaches for learning discrete speech
representation have been proposed, it is unclear how these seemingly similar
approaches relate to each other. In this paper, we consider a generative model
with discrete latent variables that learns a discrete representation for
speech. The objective of learning the generative model is formulated as
information-theoretic co-training. Besides the wide generality, the objective
can be optimized with several approaches, subsuming HuBERT-like training and
vector quantization for learning discrete representation. Empirically, we find
that the proposed approach learns discrete representation that is highly
correlated with phonetic units, more correlated than HuBERT-like training and
vector quantization.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Revealing Multimodal Contrastive Representation Learning through Latent
Partial Causal Models [85.67870425656368]
We introduce a unified causal model specifically designed for multimodal data.
We show that multimodal contrastive representation learning excels at identifying latent coupled variables.
Experiments demonstrate the robustness of our findings, even when the assumptions are violated.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Posthoc Interpretation via Quantization [9.510336895838703]
We introduce a new approach, called Posthoc Interpretation via Quantization (PIQ), for interpreting decisions made by trained classifiers.
Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space.
Our model formulation also enables learning concepts by incorporating the supervision of pretrained annotation models.
arXiv Detail & Related papers (2023-03-22T15:37:43Z) - Learning Semantic Textual Similarity via Topic-informed Discrete Latent
Variables [17.57873577962635]
We develop a topic-informed discrete latent variable model for semantic textual similarity.
Our model learns a shared latent space for sentence-pair representation via vector quantization.
We show that our model is able to surpass several strong neural baselines in semantic textual similarity tasks.
arXiv Detail & Related papers (2022-11-07T15:09:58Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Adaptive Discrete Communication Bottlenecks with Dynamic Vector
Quantization [76.68866368409216]
We propose learning to dynamically select discretization tightness conditioned on inputs.
We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.
arXiv Detail & Related papers (2022-02-02T23:54:26Z) - Cross-Modal Discrete Representation Learning [73.68393416984618]
We present a self-supervised learning framework that learns a representation that captures finer levels of granularity across different modalities.
Our framework relies on a discretized embedding space created via vector quantization that is shared across different modalities.
arXiv Detail & Related papers (2021-06-10T00:23:33Z) - Instance-Based Learning of Span Representations: A Case Study through
Named Entity Recognition [48.06319154279427]
We present a method of instance-based learning that learns similarities between spans.
Our method enables to build models that have high interpretability without sacrificing performance.
arXiv Detail & Related papers (2020-04-29T23:32:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.