Similarity Analysis of Self-Supervised Speech Representations
- URL: http://arxiv.org/abs/2010.11481v2
- Date: Tue, 2 Feb 2021 14:42:51 GMT
- Title: Similarity Analysis of Self-Supervised Speech Representations
- Authors: Yu-An Chung and Yonatan Belinkov and James Glass
- Abstract summary: We quantify the similarities between different self-supervised representations using existing similarity measures.
We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.
- Score: 44.33287205296597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised speech representation learning has recently been a prosperous
research topic. Many algorithms have been proposed for learning useful
representations from large-scale unlabeled data, and their applications to a
wide range of speech tasks have also been investigated. However, there has been
little research focusing on understanding the properties of existing
approaches. In this work, we aim to provide a comparative study of some of the
most representative self-supervised algorithms. Specifically, we quantify the
similarities between different self-supervised representations using existing
similarity measures. We also design probing tasks to study the correlation
between the models' pre-training loss and the amount of specific speech
information contained in their learned representations. In addition to showing
how various self-supervised models behave differently given the same input, our
study also finds that the training objective has a higher impact on
representation similarity than architectural choices such as building blocks
(RNN/Transformer/CNN) and directionality (uni/bidirectional). Our results also
suggest that there exists a strong correlation between pre-training loss and
downstream performance for some self-supervised algorithms.
Related papers
- A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Revisiting Self-supervised Learning of Speech Representation from a
Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective.
We use linear probes to estimate the mutual information between the target information and learned representations.
We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z) - Comparative layer-wise analysis of self-supervised speech models [29.258085176788097]
We measure acoustic, phonetic, and word-level properties encoded in individual layers, using a lightweight analysis tool based on canonical correlation analysis (CCA)
We find that these properties evolve across layers differently depending on the model, and the variations relate to the choice of pre-training objective.
We discover that CCA trends provide reliable guidance to choose layers of interest for downstream tasks and that single-layer performance often matches or improves upon using all layers, suggesting implications for more efficient use of pre-trained models.
arXiv Detail & Related papers (2022-11-08T00:59:05Z) - Weak Augmentation Guided Relational Self-Supervised Learning [80.0680103295137]
We introduce a novel relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances.
Our proposed method employs sharpened distribution of pairwise similarities among different instances as textitrelation metric.
Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures.
arXiv Detail & Related papers (2022-03-16T16:14:19Z) - Do Self-Supervised and Supervised Methods Learn Similar Visual
Representations? [3.1594831736896025]
We compare a constrastive self-supervised algorithm (SimCLR) to supervision for simple image data in a common architecture.
We find that the methods learn similar intermediate representations through dissimilar means, and that the representations diverge rapidly in the final few layers.
Our work particularly highlights the importance of the learned intermediate representations, and raises important questions for auxiliary task design.
arXiv Detail & Related papers (2021-10-01T16:51:29Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.