Untangling in Invariant Speech Recognition
- URL: http://arxiv.org/abs/2003.01787v1
- Date: Tue, 3 Mar 2020 20:48:43 GMT
- Title: Untangling in Invariant Speech Recognition
- Authors: Cory Stephenson, Jenelle Feather, Suchismita Padhy, Oguz Elibol,
Hanlin Tang, Josh McDermott, SueYeon Chung
- Abstract summary: We study how information is untangled within neural networks trained to recognize speech.
We observe speaker-specific nuisance variations are discarded by the network's hierarchy, whereas task-relevant properties are untangled in later layers.
We find that the deep representations carry out significant temporal untangling by efficiently extracting task-relevant features at each time step of the computation.
- Score: 17.996356271398295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Encouraged by the success of deep neural networks on a variety of visual
tasks, much theoretical and experimental work has been aimed at understanding
and interpreting how vision networks operate. Meanwhile, deep neural networks
have also achieved impressive performance in audio processing applications,
both as sub-components of larger systems and as complete end-to-end systems by
themselves. Despite their empirical successes, comparatively little is
understood about how these audio models accomplish these tasks. In this work,
we employ a recently developed statistical mechanical theory that connects
geometric properties of network representations and the separability of classes
to probe how information is untangled within neural networks trained to
recognize speech. We observe that speaker-specific nuisance variations are
discarded by the network's hierarchy, whereas task-relevant properties such as
words and phonemes are untangled in later layers. Higher level concepts such as
parts-of-speech and context dependence also emerge in the later layers of the
network. Finally, we find that the deep representations carry out significant
temporal untangling by efficiently extracting task-relevant features at each
time step of the computation. Taken together, these findings shed light on how
deep auditory models process time dependent input signals to achieve invariant
speech recognition, and show how different concepts emerge through the layers
of the network.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition [0.36832029288386137]
We study encoder-decoder recurrent neural networks with attention mechanisms for the task of reading handwritten chess scoresheets.
We characterize the task in terms of three subtasks, namely input-output alignment, sequential pattern recognition, and handwriting recognition.
We argue that such knowledge might help one to better balance factors to properly train a network.
arXiv Detail & Related papers (2024-04-23T16:23:18Z) - Deep Neural Networks for Automatic Speaker Recognition Do Not Learn
Supra-Segmental Temporal Features [2.724035499453558]
We present and apply a novel test to quantify to what extent the performance of state-of-the-art neural networks for speaker recognition can be explained by modeling SST.
We find that a variety of CNN- and RNN-based neural network architectures for speaker recognition do not model SST to any sufficient degree, even when forced.
arXiv Detail & Related papers (2023-11-01T12:45:31Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by
Isolating Task-Specific Subnetworks in Feedforward Neural Networks [0.0]
We identify a methodology and network representational structure which allows a pruned network to employ previously unused weights to learn subsequent tasks.
We show that networks trained using our approaches are able to learn multiple tasks, which may be related or unrelated, in parallel or in sequence without sacrificing performance on any task or exhibiting catastrophic forgetting.
arXiv Detail & Related papers (2022-07-18T15:07:13Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Generative Adversarial Phonology: Modeling unsupervised phonetic and
phonological learning with neural networks [0.0]
Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations.
This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture.
We propose a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties.
arXiv Detail & Related papers (2020-06-06T20:31:23Z) - Sparse Mixture of Local Experts for Efficient Speech Enhancement [19.645016575334786]
We investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks.
By splitting up the speech denoising task into non-overlapping subproblems, we are able to improve denoising performance while also reducing computational complexity.
Our findings demonstrate that a fine-tuned ensemble network is able to exceed the speech denoising capabilities of a generalist network.
arXiv Detail & Related papers (2020-05-16T23:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.