Equivalence of Segmental and Neural Transducer Modeling: A Proof of
Concept
- URL: http://arxiv.org/abs/2104.06104v1
- Date: Tue, 13 Apr 2021 11:20:48 GMT
- Title: Equivalence of Segmental and Neural Transducer Modeling: A Proof of
Concept
- Authors: Wei Zhou, Albert Zeyer, Andr\'e Merboldt, Ralf Schl\"uter, Hermann Ney
- Abstract summary: We prove that the widely used class of RNN-Transducer models and segmental models (direct HMM) are equivalent.
It is shown that blank probabilities translate into segment length probabilities and vice versa.
- Score: 56.46135010588918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advent of direct models in automatic speech recognition (ASR), the
formerly prevalent frame-wise acoustic modeling based on hidden Markov models
(HMM) diversified into a number of modeling architectures like encoder-decoder
attention models, transducer models and segmental models (direct HMM). While
transducer models stay with a frame-level model definition, segmental models
are defined on the level of label segments, directly. While
(soft-)attention-based models avoid explicit alignment, transducer and
segmental approach internally do model alignment, either by segment hypotheses
or, more implicitly, by emitting so-called blank symbols. In this work, we
prove that the widely used class of RNN-Transducer models and segmental models
(direct HMM) are equivalent and therefore show equal modeling power. It is
shown that blank probabilities translate into segment length probabilities and
vice versa. In addition, we provide initial experiments investigating decoding
and beam-pruning, comparing time-synchronous and label-/segment-synchronous
search strategies and their properties using the same underlying model.
Related papers
- STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition [50.064502884594376]
We study the problem of human action recognition using motion capture (MoCap) sequences.
We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences.
The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z) - Detecting Signs of Model Change with Continuous Model Selection Based on
Descriptive Dimensionality [21.86268650362205]
We address the issue of detecting changes of models that lie behind a data stream.
We propose a novel methodology for detecting signs of model changes by tracking the rise-up of Ddim in a data stream.
arXiv Detail & Related papers (2023-02-23T16:10:06Z) - Context-specific kernel-based hidden Markov model for time series
analysis [9.007829035130886]
We introduce a new hidden Markov model based on kernel density estimation.
It is capable of capturing kernel dependencies using context-specific Bayesian networks.
The benefits in likelihood and classification accuracy from the proposed model are quantified and analyzed.
arXiv Detail & Related papers (2023-01-24T09:10:38Z) - DiffusER: Discrete Diffusion via Edit-based Reconstruction [88.62707047517914]
DiffusER is an edit-based generative model for text based on denoising diffusion models.
It can rival autoregressive models on several tasks spanning machine translation, summarization, and style transfer.
It can also perform other varieties of generation that standard autoregressive models are not well-suited for.
arXiv Detail & Related papers (2022-10-30T16:55:23Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Scaling Hidden Markov Language Models [118.55908381553056]
This work revisits the challenge of scaling HMMs to language modeling datasets.
We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization.
arXiv Detail & Related papers (2020-11-09T18:51:55Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Deep Neural Dynamic Bayesian Networks applied to EEG sleep spindles
modeling [0.0]
We propose a generative model for single-channel EEG that incorporates the constraints experts actively enforce during visual scoring.
We derive algorithms for exact, tractable inference as a special case of Generalized Expectation Maximization.
We validate the model on three public datasets and provide support that more complex models are able to surpass state-of-the-art detectors.
arXiv Detail & Related papers (2020-10-16T21:48:29Z) - Semi-supervised Neural Chord Estimation Based on a Variational
Autoencoder with Latent Chord Labels and Features [18.498244371257304]
This paper describes a statistically-principled semi-supervised method of automatic chord estimation.
It can make effective use of music signals regardless of the availability of chord annotations.
arXiv Detail & Related papers (2020-05-14T15:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.