Scaling Hidden Markov Language Models
- URL: http://arxiv.org/abs/2011.04640v1
- Date: Mon, 9 Nov 2020 18:51:55 GMT
- Title: Scaling Hidden Markov Language Models
- Authors: Justin T. Chiu and Alexander M. Rush
- Abstract summary: This work revisits the challenge of scaling HMMs to language modeling datasets.
We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization.
- Score: 118.55908381553056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The hidden Markov model (HMM) is a fundamental tool for sequence modeling
that cleanly separates the hidden state from the emission structure. However,
this separation makes it difficult to fit HMMs to large datasets in modern NLP,
and they have fallen out of use due to very poor performance compared to fully
observed models. This work revisits the challenge of scaling HMMs to language
modeling datasets, taking ideas from recent approaches to neural modeling. We
propose methods for scaling HMMs to massive state spaces while maintaining
efficient exact inference, a compact parameterization, and effective
regularization. Experiments show that this approach leads to models that are
more accurate than previous HMM and n-gram-based methods, making progress
towards the performance of state-of-the-art neural models.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - State-observation augmented diffusion model for nonlinear assimilation [6.682908186025083]
We propose a novel data-driven assimilation algorithm based on generative models.
Our State-Observation Augmented Diffusion (SOAD) model is designed to handle nonlinear physical and observational models more effectively.
arXiv Detail & Related papers (2024-07-31T03:47:20Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Normalizing Flow based Hidden Markov Models for Classification of Speech
Phones with Explainability [25.543231171094384]
In pursuit of explainability, we develop generative models for sequential data.
We combine modern neural networks (normalizing flows) and traditional generative models (hidden Markov models - HMMs)
The proposed generative models can compute likelihood of a data and hence directly suitable for maximum-likelihood (ML) classification approach.
arXiv Detail & Related papers (2021-07-01T20:10:55Z) - Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis
of Head and Prompt Tuning [66.44344616836158]
We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text.
We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM.
arXiv Detail & Related papers (2021-06-17T03:31:47Z) - Training Structured Mechanical Models by Minimizing Discrete
Euler-Lagrange Residual [36.52097893036073]
Structured Mechanical Models (SMMs) are a data-efficient black-box parameterization of mechanical systems.
We propose a methodology for fitting SMMs to data by minimizing the discrete Euler-Lagrange residual.
Experiments show that our methodology learns models that are better in accuracy to those of the conventional schemes for fitting SMMs.
arXiv Detail & Related papers (2021-05-05T00:44:01Z) - Lossless compression with state space models using bits back coding [17.625326990547332]
We generalize the 'bits back with ANS' method to time-series models with a latent Markov structure.
We provide experimental evidence that our method is effective for small scale models, and discuss its applicability to larger scale settings such as video compression.
arXiv Detail & Related papers (2021-03-18T10:34:57Z) - Robust Classification using Hidden Markov Models and Mixtures of
Normalizing Flows [25.543231171094384]
We use a generative model that combines the state transitions of a hidden Markov model (HMM) and the neural network based probability distributions for the hidden states of the HMM.
We verify the improved robustness of NMM-HMM classifiers in an application to speech recognition.
arXiv Detail & Related papers (2021-02-15T00:40:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.