Related papers: Topological Data Analysis for Speech Processing

Topological Data Analysis for Speech Processing

URL: http://arxiv.org/abs/2211.17223v3
Date: Tue, 6 Jun 2023 11:25:34 GMT
Title: Topological Data Analysis for Speech Processing
Authors: Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko, Evgeny Burnaev
Abstract summary: We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. We also show that topological features are able to reveal functional roles of speech Transformer heads.
Score: 10.00176964652466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about $9\%$ accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy $80.155$. We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction. Appendices, an introduction to TDA, and other additional materials are available here - https://topohubert.github.io/speech-topology-webpages/

Related papers

Improving Remote Sensing Classification using Topological Data Analysis and Convolutional Neural Networks [0.0]
We propose a TDA feature engineering pipeline and a simple method to integrate topological features with deep learning models on remote sensing classification.<n>Our method improves the performance of a ResNet18 model on the EuroSAT dataset by 1.44% achieving 99.33% accuracy.<n>This is the first application of TDA features in satellite scene classification with deep learning.
arXiv Detail & Related papers (2025-07-14T15:22:29Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
We Augmented Whisper With kNN and You Won't Believe What Came Next [10.174848090916669]
We show that Whisper, a transformer end-to-end speech model, benefits from $k$NN. We discuss implications for speaker adaptation, and analyze improvements by gender, accent, and age.
arXiv Detail & Related papers (2024-10-24T15:32:52Z)
Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation [53.91958614666386]
Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs) We propose a novel UGRL method based on Multi-hop feature Quality Estimation (MQE)
arXiv Detail & Related papers (2024-07-29T12:24:28Z)
Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification. We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z)
Can BERT eat RuCoLA? Topological Data Analysis to Explain [3.9775243265158076]
This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features. We construct directed attention graphs from attention matrices, derive topological features from them, and feed them to linear classifiers. We introduce two novel features, chordality, and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines.
arXiv Detail & Related papers (2023-04-04T10:11:06Z)
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech [96.0009517132463]
We introduce a word-level prosody encoder, which quantizes the low-frequency band of the speech and compresses prosody attributes in the latent prosody vector (LPV) We then introduce an LPV predictor, which predicts LPV given word sequence and fine-tune it on the high-quality TTS dataset. Experimental results show that ProsoSpeech can generate speech with richer prosody compared with baseline methods.
arXiv Detail & Related papers (2022-02-16T01:42:32Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z)
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis [16.850888973106706]
We conduct a post-hoc functional interpretability analysis of pretrained speech models using the probing framework. We analyze utterance-level representations of speech models trained for various tasks such as speaker recognition and dialect identification. Our results reveal several novel findings, including: i) channel and gender information are distributed across the network, ii) the information is redundantly available in neurons with respect to a task, and iv) complex properties such as dialectal information are encoded only in the task-oriented pretrained network.
arXiv Detail & Related papers (2021-07-01T13:32:55Z)
Persistence Homology of TEDtalk: Do Sentence Embeddings Have a Topological Shape? [3.1675545188012078]
We investigate the possibility of applying TDA to improve the classification accuracy of public speaking rating. We calculated emphpersistence image vectors for the sentence embeddings of TEDtalk data and feed this vectors as additional inputs to our machine learning models. From our results, we could not conclude that the topological shapes of the sentence embeddings can help us train a better model for public speaking rating.
arXiv Detail & Related papers (2021-03-25T20:52:17Z)
Building powerful and equivariant graph neural networks with structural message-passing [74.93169425144755]
We propose a powerful and equivariant message-passing framework based on two ideas. First, we propagate a one-hot encoding of the nodes, in addition to the features, in order to learn a local context matrix around each node. Second, we propose methods for the parametrization of the message and update functions that ensure permutation equivariance.
arXiv Detail & Related papers (2020-06-26T17:15:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.