Insights on Neural Representations for End-to-End Speech Recognition
- URL: http://arxiv.org/abs/2205.09456v1
- Date: Thu, 19 May 2022 10:19:32 GMT
- Title: Insights on Neural Representations for End-to-End Speech Recognition
- Authors: Anna Ollerenshaw, Md Asif Jalal, Thomas Hain
- Abstract summary: End-to-end automatic speech recognition (ASR) models aim to learn a generalised speech representation.
Previous investigations of network similarities using correlation analysis techniques have not been explored for End-to-End ASR models.
This paper analyses and explores the internal dynamics between layers during training with CNN, LSTM and Transformer based approaches.
- Score: 28.833851817220616
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: End-to-end automatic speech recognition (ASR) models aim to learn a
generalised speech representation. However, there are limited tools available
to understand the internal functions and the effect of hierarchical
dependencies within the model architecture. It is crucial to understand the
correlations between the layer-wise representations, to derive insights on the
relationship between neural representations and performance.
Previous investigations of network similarities using correlation analysis
techniques have not been explored for End-to-End ASR models. This paper
analyses and explores the internal dynamics between layers during training with
CNN, LSTM and Transformer based approaches using Canonical correlation analysis
(CCA) and centered kernel alignment (CKA) for the experiments. It was found
that neural representations within CNN layers exhibit hierarchical correlation
dependencies as layer depth increases but this is mostly limited to cases where
neural representation correlates more closely. This behaviour is not observed
in LSTM architecture, however there is a bottom-up pattern observed across the
training process, while Transformer encoder layers exhibit irregular
coefficiency correlation as neural depth increases. Altogether, these results
provide new insights into the role that neural architectures have upon speech
recognition performance. More specifically, these techniques can be used as
indicators to build better performing speech recognition models.
Related papers
- Steinmetz Neural Networks for Complex-Valued Data [23.80312814400945]
We introduce a new approach to processing complex-valued data using DNNs consisting of parallel real-valuedetzworks with coupled outputs.
Our proposed class of architectures, referred to as Steinmetz Neural Networks, leverage multi-view learning to construct more interpretable representations within the latent space.
Our numerical experiments depict the improved performance and to additive noise, afforded by these networks on benchmark datasets and synthetic examples.
arXiv Detail & Related papers (2024-09-16T08:26:06Z) - Enhancing Cognitive Workload Classification Using Integrated LSTM Layers and CNNs for fNIRS Data Analysis [13.74551296919155]
This paper explores the im-pact of Long Short-Term Memory layers on the effectiveness of Convolutional Neural Networks (CNNs) within deep learning models.
By integrating LSTM layers, the model can capture temporal dependencies in the fNIRS data, al-lowing for a more comprehensive understanding of cognitive states.
arXiv Detail & Related papers (2024-07-22T11:28:34Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - Probing the Information Encoded in Neural-based Acoustic Models of
Automatic Speech Recognition Systems [7.207019635697126]
This article aims to determine which and where information is located in an automatic speech recognition acoustic model (AM)
Experiments are performed on speaker verification, acoustic environment classification, gender classification, tempo-distortion detection systems and speech sentiment/emotion identification.
Analysis showed that neural-based AMs hold heterogeneous information that seems surprisingly uncorrelated with phoneme recognition.
arXiv Detail & Related papers (2024-02-29T18:43:53Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - Experimental Observations of the Topology of Convolutional Neural
Network Activations [2.4235626091331737]
Topological data analysis provides compact, noise-robust representations of complex structures.
Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture.
In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification.
arXiv Detail & Related papers (2022-12-01T02:05:44Z) - Probing Statistical Representations For End-To-End ASR [28.833851817220616]
This paper investigates cross-domain language model dependencies within transformer architectures using SVCCA.
It was found that specific neural representations within the transformer layers exhibit correlated behaviour which impacts recognition performance.
arXiv Detail & Related papers (2022-11-03T17:08:14Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.