Gradient-Adjusted Neuron Activation Profiles for Comprehensive
Introspection of Convolutional Speech Recognition Models
- URL: http://arxiv.org/abs/2002.08125v1
- Date: Wed, 19 Feb 2020 11:59:36 GMT
- Title: Gradient-Adjusted Neuron Activation Profiles for Comprehensive
Introspection of Convolutional Speech Recognition Models
- Authors: Andreas Krug, Sebastian Stober
- Abstract summary: We introduce Gradient-adjusted Neuron Activation Profiles (GradNAPs) as means to interpret features and representations in Deep Neural Networks.
GradNAPs are characteristic responses of ANNs to particular groups of inputs, which incorporate the relevance of neurons for prediction.
We show how to utilize GradNAPs to gain insight about how data is processed in ANNs.
- Score: 1.6752182911522515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning based Automatic Speech Recognition (ASR) models are very
successful, but hard to interpret. To gain better understanding of how
Artificial Neural Networks (ANNs) accomplish their tasks, introspection methods
have been proposed. Adapting such techniques from computer vision to speech
recognition is not straight-forward, because speech data is more complex and
less interpretable than image data. In this work, we introduce
Gradient-adjusted Neuron Activation Profiles (GradNAPs) as means to interpret
features and representations in Deep Neural Networks. GradNAPs are
characteristic responses of ANNs to particular groups of inputs, which
incorporate the relevance of neurons for prediction. We show how to utilize
GradNAPs to gain insight about how data is processed in ANNs. This includes
different ways of visualizing features and clustering of GradNAPs to compare
embeddings of different groups of inputs in any layer of a given network. We
demonstrate our proposed techniques using a fully-convolutional ASR model.
Related papers
- Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Graph Neural Networks Provably Benefit from Structural Information: A
Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning.
This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z) - SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained
Image Categorization [24.286426387100423]
We propose a method that captures subtle changes by aggregating context-aware features from most relevant image-regions.
Our approach is inspired by the recent advancement in self-attention and graph neural networks (GNNs)
It outperforms the state-of-the-art approaches by a significant margin in recognition accuracy.
arXiv Detail & Related papers (2022-09-05T19:43:15Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Visualizing Automatic Speech Recognition -- Means for a Better
Understanding? [0.1868368163807795]
We show how attribution methods, that we import from image recognition and suitably adapt to handle audio data, can help to clarify the working of ASR.
Taking Speech Deep, an end-to-end model for ASR, as a case study, we show how these techniques help to visualize which features of the input are the most influential in determining the output.
arXiv Detail & Related papers (2022-02-01T13:35:08Z) - What do End-to-End Speech Models Learn about Speaker, Language and
Channel Information? A Layer-wise and Neuron-level Analysis [16.850888973106706]
We conduct a post-hoc functional interpretability analysis of pretrained speech models using the probing framework.
We analyze utterance-level representations of speech models trained for various tasks such as speaker recognition and dialect identification.
Our results reveal several novel findings, including: i) channel and gender information are distributed across the network, ii) the information is redundantly available in neurons with respect to a task, and iv) complex properties such as dialectal information are encoded only in the task-oriented pretrained network.
arXiv Detail & Related papers (2021-07-01T13:32:55Z) - Comparative evaluation of CNN architectures for Image Caption Generation [1.2183405753834562]
We have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks.
We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.
arXiv Detail & Related papers (2021-02-23T05:43:54Z) - Node2Seq: Towards Trainable Convolutions in Graph Neural Networks [59.378148590027735]
We propose a graph network layer, known as Node2Seq, to learn node embeddings with explicitly trainable weights for different neighboring nodes.
For a target node, our method sorts its neighboring nodes via attention mechanism and then employs 1D convolutional neural networks (CNNs) to enable explicit weights for information aggregation.
In addition, we propose to incorporate non-local information for feature learning in an adaptive manner based on the attention scores.
arXiv Detail & Related papers (2021-01-06T03:05:37Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Learning Deep Interleaved Networks with Asymmetric Co-Attention for
Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction.
In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies.
Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z) - Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge
into Deep Neural Networks [11.622060073764944]
We introduce Deep Adaptive Semantic Logic (DASL), a novel framework for automating the generation of deep neural networks.
DASL incorporates user-provided formal knowledge to improve learning from data.
We evaluate DASL on a visual relationship detection task and demonstrate that the addition of commonsense knowledge improves performance by $10.7%$ in a data scarce setting.
arXiv Detail & Related papers (2020-03-16T17:37:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.