X-DC: Explainable Deep Clustering based on Learnable Spectrogram
Templates
- URL: http://arxiv.org/abs/2009.08661v3
- Date: Mon, 19 Apr 2021 06:53:09 GMT
- Title: X-DC: Explainable Deep Clustering based on Learnable Spectrogram
Templates
- Authors: Chihiro Watanabe, Hirokazu Kameoka
- Abstract summary: We propose the concept of explainable deep clustering (X-DC), whose network architecture can be interpreted as a process of fitting learnable spectrogram templates to an input spectrogram followed by Wiener filtering.
We experimentally show that the proposed X-DC enables us to visualize and understand the clues for the model to determine the embedding vectors while achieving speech separation performance comparable to that of the original DC models.
- Score: 17.83563578034567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have achieved substantial predictive performance
in various speech processing tasks. Particularly, it has been shown that a
monaural speech separation task can be successfully solved with a DNN-based
method called deep clustering (DC), which uses a DNN to describe the process of
assigning a continuous vector to each time-frequency (TF) bin and measure how
likely each pair of TF bins is to be dominated by the same speaker. In DC, the
DNN is trained so that the embedding vectors for the TF bins dominated by the
same speaker are forced to get close to each other. One concern regarding DC is
that the embedding process described by a DNN has a black-box structure, which
is usually very hard to interpret. The potential weakness owing to the
non-interpretable black-box structure is that it lacks the flexibility of
addressing the mismatch between training and test conditions (caused by
reverberation, for instance). To overcome this limitation, in this paper, we
propose the concept of explainable deep clustering (X-DC), whose network
architecture can be interpreted as a process of fitting learnable spectrogram
templates to an input spectrogram followed by Wiener filtering. During
training, the elements of the spectrogram templates and their activations are
constrained to be non-negative, which facilitates the sparsity of their values
and thus improves interpretability. The main advantage of this framework is
that it naturally allows us to incorporate a model adaptation mechanism into
the network thanks to its physically interpretable structure. We experimentally
show that the proposed X-DC enables us to visualize and understand the clues
for the model to determine the embedding vectors while achieving speech
separation performance comparable to that of the original DC models.
Related papers
- Learning local discrete features in explainable-by-design convolutional neural networks [0.0]
We introduce an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism.
The model consists of the predictor, that is a high-accuracy CNN with residual or dense skip connections.
By collecting observations and directly calculating probabilities, we can explain causal relationships between motifs of adjacent levels.
arXiv Detail & Related papers (2024-10-31T18:39:41Z) - Linking in Style: Understanding learned features in deep learning models [0.0]
Convolutional neural networks (CNNs) learn abstract features to perform object classification.
We propose an automatic method to visualize and systematically analyze learned features in CNNs.
arXiv Detail & Related papers (2024-09-25T12:28:48Z) - Using Logic Programming and Kernel-Grouping for Improving
Interpretability of Convolutional Neural Networks [1.6317061277457001]
We present a neurosymbolic framework, NeSyFOLD-G that generates a symbolic rule-set using the last layer kernels of the CNN.
We show that grouping similar kernels leads to a significant reduction in the size of the rule-set generated by FOLD-SE-M.
We also propose a novel algorithm for labeling each predicate in the rule-set with the semantic concept(s) that its corresponding kernel group represents.
arXiv Detail & Related papers (2023-10-19T18:12:49Z) - Tackling Interpretability in Audio Classification Networks with
Non-negative Matrix Factorization [2.423660247459463]
This paper tackles two major problem settings for interpretability of audio processing networks.
For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user.
We propose a novel interpreter design that incorporates non-negative matrix factorization (NMF)
arXiv Detail & Related papers (2023-05-11T20:50:51Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z) - Interpreting Graph Neural Networks for NLP With Differentiable Edge
Masking [63.49779304362376]
Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models.
We introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges.
We show that we can drop a large proportion of edges without deteriorating the performance of the model.
arXiv Detail & Related papers (2020-10-01T17:51:19Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.