Topology combined machine learning for consonant recognition
- URL: http://arxiv.org/abs/2311.15210v1
- Date: Sun, 26 Nov 2023 06:53:56 GMT
- Title: Topology combined machine learning for consonant recognition
- Authors: Pingyao Feng, Siheng Yi, Qingrui Qu, Zhiwang Yu, Yifei Zhu
- Abstract summary: TopCap is capable of capturing features rarely detected in datasets with low dimensional intrinsicity.
In classifying voiced and voiceless consonants, TopCap achieves an accuracy exceeding 96%.
TopCap is geared towards designing topological convolutional layers for deep learning of speech and audio signals.
- Score: 8.188982461393278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In artificial-intelligence-aided signal processing, existing deep learning
models often exhibit a black-box structure, and their validity and
comprehensibility remain elusive. The integration of topological methods,
despite its relatively nascent application, serves a dual purpose of making
models more interpretable as well as extracting structural information from
time-dependent data for smarter learning. Here, we provide a transparent and
broadly applicable methodology, TopCap, to capture the most salient topological
features inherent in time series for machine learning. Rooted in
high-dimensional ambient spaces, TopCap is capable of capturing features rarely
detected in datasets with low intrinsic dimensionality. Applying time-delay
embedding and persistent homology, we obtain descriptors which encapsulate
information such as the vibration of a time series, in terms of its variability
of frequency, amplitude, and average line, demonstrated with simulated data.
This information is then vectorised and fed into multiple machine learning
algorithms such as k-nearest neighbours and support vector machine. Notably, in
classifying voiced and voiceless consonants, TopCap achieves an accuracy
exceeding 96% and is geared towards designing topological convolutional layers
for deep learning of speech and audio signals.
Related papers
- Probing the Information Encoded in Neural-based Acoustic Models of
Automatic Speech Recognition Systems [7.207019635697126]
This article aims to determine which and where information is located in an automatic speech recognition acoustic model (AM)
Experiments are performed on speaker verification, acoustic environment classification, gender classification, tempo-distortion detection systems and speech sentiment/emotion identification.
Analysis showed that neural-based AMs hold heterogeneous information that seems surprisingly uncorrelated with phoneme recognition.
arXiv Detail & Related papers (2024-02-29T18:43:53Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Unsupervised Representation Learning for Time Series with Temporal
Neighborhood Coding [8.45908939323268]
We propose a self-supervised framework for learning generalizable representations for non-stationary time series.
Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable.
arXiv Detail & Related papers (2021-06-01T19:53:24Z) - Time Series Classification via Topological Data Analysis [0.0]
We perform binary and ternary classification tasks on two public datasets.
We accomplish our goal by using persistent homology to engineer stable topological features.
arXiv Detail & Related papers (2021-02-03T09:09:05Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - A Novel Anomaly Detection Algorithm for Hybrid Production Systems based
on Deep Learning and Timed Automata [73.38551379469533]
DAD:DeepAnomalyDetection is a new approach for automatic model learning and anomaly detection in hybrid production systems.
It combines deep learning and timed automata for creating behavioral model from observations.
The algorithm has been applied to few data sets including two from real systems and has shown promising results.
arXiv Detail & Related papers (2020-10-29T08:27:43Z) - Network Classifiers Based on Social Learning [71.86764107527812]
We propose a new way of combining independently trained classifiers over space and time.
The proposed architecture is able to improve prediction performance over time with unlabeled data.
We show that this strategy results in consistent learning with high probability, and it yields a robust structure against poorly trained classifiers.
arXiv Detail & Related papers (2020-10-23T11:18:20Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
Modeling [61.351967629600594]
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach.
In this approach, we combine a bottle-neck feature extractor (BNE) with a seq2seq synthesis module.
Objective and subjective evaluations show that the proposed any-to-many approach has superior voice conversion performance in terms of both naturalness and speaker similarity.
arXiv Detail & Related papers (2020-09-06T13:01:06Z) - Compact representation of temporal processes in echosounder time series
via matrix decomposition [0.7614628596146599]
We develop a methodology that builds compact representation of long-term echosounder time series using intrinsic features in the data.
This work forms the basis for constructing robust time series analytics for large-scale, acoustics-based biological observation in the ocean.
arXiv Detail & Related papers (2020-07-06T17:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.