Nonlinear ISA with Auxiliary Variables for Learning Speech
Representations
- URL: http://arxiv.org/abs/2007.12948v1
- Date: Sat, 25 Jul 2020 14:53:09 GMT
- Title: Nonlinear ISA with Auxiliary Variables for Learning Speech
Representations
- Authors: Amrith Setlur, Barnabas Poczos, Alan W Black
- Abstract summary: We introduce a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables.
We propose an algorithm that learns unsupervised speech representations whose subspaces are independent.
- Score: 51.9516685516144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper extends recent work on nonlinear Independent Component Analysis
(ICA) by introducing a theoretical framework for nonlinear Independent Subspace
Analysis (ISA) in the presence of auxiliary variables. Observed high
dimensional acoustic features like log Mel spectrograms can be considered as
surface level manifestations of nonlinear transformations over individual
multivariate sources of information like speaker characteristics, phonological
content etc. Under assumptions of energy based models we use the theory of
nonlinear ISA to propose an algorithm that learns unsupervised speech
representations whose subspaces are independent and potentially highly
correlated with the original non-stationary multivariate sources. We show how
nonlinear ICA with auxiliary variables can be extended to a generic
identifiable model for subspaces as well while also providing sufficient
conditions for the identifiability of these high dimensional subspaces. Our
proposed methodology is generic and can be integrated with standard
unsupervised approaches to learn speech representations with subspaces that can
theoretically capture independent higher order speech signals. We evaluate the
gains of our algorithm when integrated with the Autoregressive Predictive
Decoding (APC) model by showing empirical results on the speaker verification
and phoneme recognition tasks.
Related papers
- Latent Space Perspicacity and Interpretation Enhancement (LS-PIE)
Framework [0.0]
This paper proposes a general framework to enhance latent space representations for improving interpretability of linear latent spaces.
Although the concepts in this paper are language agnostic, the framework is written in Python.
Several innovative enhancements are incorporated including latent ranking (LR), latent scaling (LS), latent clustering (LC), and latent condensing (LCON)
arXiv Detail & Related papers (2023-07-11T03:56:04Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Learning and controlling the source-filter representation of speech with
a variational autoencoder [23.05989605017053]
In speech processing, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors.
We propose a method to accurately and independently control the source-filter speech factors within the latent subspaces.
Without requiring additional information such as text or human-labeled data, this results in a deep generative model of speech spectrograms.
arXiv Detail & Related papers (2022-04-14T16:13:06Z) - Adaptive Discrete Communication Bottlenecks with Dynamic Vector
Quantization [76.68866368409216]
We propose learning to dynamically select discretization tightness conditioned on inputs.
We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.
arXiv Detail & Related papers (2022-02-02T23:54:26Z) - I Don't Need $\mathbf{u}$: Identifiable Non-Linear ICA Without Side
Information [13.936583337756883]
We introduce a new approach for identifiable non-linear ICA models.
In particular, we focus on generative models which perform clustering in their latent space.
arXiv Detail & Related papers (2021-06-09T17:22:08Z) - Transforming Feature Space to Interpret Machine Learning Models [91.62936410696409]
This contribution proposes a novel approach that interprets machine-learning models through the lens of feature space transformations.
It can be used to enhance unconditional as well as conditional post-hoc diagnostic tools.
A case study on remote-sensing landcover classification with 46 features is used to demonstrate the potential of the proposed approach.
arXiv Detail & Related papers (2021-04-09T10:48:11Z) - A Correspondence Variational Autoencoder for Unsupervised Acoustic Word
Embeddings [50.524054820564395]
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.
The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages.
arXiv Detail & Related papers (2020-12-03T19:24:42Z) - Controlling for sparsity in sparse factor analysis models: adaptive
latent feature sharing for piecewise linear dimensionality reduction [2.896192909215469]
We propose a simple and tractable parametric feature allocation model which can address key limitations of current latent feature decomposition techniques.
We derive a novel adaptive Factor analysis (aFA), as well as, an adaptive probabilistic principle component analysis (aPPCA) capable of flexible structure discovery and dimensionality reduction.
We show that aPPCA and aFA can infer interpretable high level features both when applied on raw MNIST and when applied for interpreting autoencoder features.
arXiv Detail & Related papers (2020-06-22T16:09:11Z) - BasisVAE: Translation-invariant feature-level clustering with
Variational Autoencoders [9.51828574518325]
Variational Autoencoders (VAEs) provide a flexible and scalable framework for non-linear dimensionality reduction.
We show how a collapsed variational inference scheme leads to scalable and efficient inference for BasisVAE.
arXiv Detail & Related papers (2020-03-06T23:10:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.