Topological Data Analysis for Speech Processing
- URL: http://arxiv.org/abs/2211.17223v3
- Date: Tue, 6 Jun 2023 11:25:34 GMT
- Title: Topological Data Analysis for Speech Processing
- Authors: Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil
Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko,
Evgeny Burnaev
- Abstract summary: We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head.
We also show that topological features are able to reveal functional roles of speech Transformer heads.
- Score: 10.00176964652466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We apply topological data analysis (TDA) to speech classification problems
and to the introspection of a pretrained speech model, HuBERT. To this end, we
introduce a number of topological and algebraic features derived from
Transformer attention maps and embeddings. We show that a simple linear
classifier built on top of such features outperforms a fine-tuned
classification head. In particular, we achieve an improvement of about $9\%$
accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed
feature set reaches a new state of the art performance with accuracy $80.155$.
We also show that topological features are able to reveal functional roles of
speech Transformer heads; e.g., we find the heads capable to distinguish
between pairs of sample sources (natural/synthetic) or voices without any
downstream fine-tuning. Our results demonstrate that TDA is a promising new
approach for speech analysis, especially for tasks that require structural
prediction. Appendices, an introduction to TDA, and other additional materials
are available here - https://topohubert.github.io/speech-topology-webpages/
Related papers
- Topological data analysis of human vowels: Persistent homologies across
representation spaces [0.0]
Topological Data Analysis (TDA) has been successfully used for various tasks in signal/image processing.
This paper attempts to assess the quality of the discriminant information of the topological signatures extracted from three different representation spaces.
We show that topologically-augmented random forest improves the Out-of-Bag Error (OOB) over solely based Mel-Frequency Cepstral Coefficients (MFCC) for the last two problems.
arXiv Detail & Related papers (2023-10-10T10:37:54Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Can BERT eat RuCoLA? Topological Data Analysis to Explain [3.9775243265158076]
This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features.
We construct directed attention graphs from attention matrices, derive topological features from them, and feed them to linear classifiers.
We introduce two novel features, chordality, and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines.
arXiv Detail & Related papers (2023-04-04T10:11:06Z) - ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in
Text-to-Speech [96.0009517132463]
We introduce a word-level prosody encoder, which quantizes the low-frequency band of the speech and compresses prosody attributes in the latent prosody vector (LPV)
We then introduce an LPV predictor, which predicts LPV given word sequence and fine-tune it on the high-quality TTS dataset.
Experimental results show that ProsoSpeech can generate speech with richer prosody compared with baseline methods.
arXiv Detail & Related papers (2022-02-16T01:42:32Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z) - What do End-to-End Speech Models Learn about Speaker, Language and
Channel Information? A Layer-wise and Neuron-level Analysis [16.850888973106706]
We conduct a post-hoc functional interpretability analysis of pretrained speech models using the probing framework.
We analyze utterance-level representations of speech models trained for various tasks such as speaker recognition and dialect identification.
Our results reveal several novel findings, including: i) channel and gender information are distributed across the network, ii) the information is redundantly available in neurons with respect to a task, and iv) complex properties such as dialectal information are encoded only in the task-oriented pretrained network.
arXiv Detail & Related papers (2021-07-01T13:32:55Z) - Persistence Homology of TEDtalk: Do Sentence Embeddings Have a
Topological Shape? [3.1675545188012078]
We investigate the possibility of applying TDA to improve the classification accuracy of public speaking rating.
We calculated emphpersistence image vectors for the sentence embeddings of TEDtalk data and feed this vectors as additional inputs to our machine learning models.
From our results, we could not conclude that the topological shapes of the sentence embeddings can help us train a better model for public speaking rating.
arXiv Detail & Related papers (2021-03-25T20:52:17Z) - TERA: Self-Supervised Learning of Transformer Encoder Representation for
Speech [63.03318307254081]
TERA stands for Transformer Representations from Alteration.
We use alteration along three axes to pre-train Transformers on a large amount of unlabeled speech.
TERA can be used for speech representations extraction or fine-tuning with downstream models.
arXiv Detail & Related papers (2020-07-12T16:19:00Z) - Building powerful and equivariant graph neural networks with structural
message-passing [74.93169425144755]
We propose a powerful and equivariant message-passing framework based on two ideas.
First, we propagate a one-hot encoding of the nodes, in addition to the features, in order to learn a local context matrix around each node.
Second, we propose methods for the parametrization of the message and update functions that ensure permutation equivariance.
arXiv Detail & Related papers (2020-06-26T17:15:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.