Hidden Markov Models with Momentum
- URL: http://arxiv.org/abs/2206.04057v1
- Date: Wed, 8 Jun 2022 15:49:43 GMT
- Title: Hidden Markov Models with Momentum
- Authors: Andrew Miller and Fabio Di Troia and Mark Stamp
- Abstract summary: We experiment with adding momentum to the Baum-Welch expectation-maximization algorithm for training Hidden Markov Models.
Our experiments indicate that adding momentum to Baum-Welch can reduce the number of iterations required for initial convergence.
However, momentum does not seem to improve the final model performance at a high number of iterations.
- Score: 6.48893856598641
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Momentum is a popular technique for improving convergence rates during
gradient descent. In this research, we experiment with adding momentum to the
Baum-Welch expectation-maximization algorithm for training Hidden Markov
Models. We compare discrete Hidden Markov Models trained with and without
momentum on English text and malware opcode data. The effectiveness of momentum
is determined by measuring the changes in model score and classification
accuracy due to momentum. Our extensive experiments indicate that adding
momentum to Baum-Welch can reduce the number of iterations required for initial
convergence during HMM training, particularly in cases where the model is slow
to converge. However, momentum does not seem to improve the final model
performance at a high number of iterations.
Related papers
- Overshoot: Taking advantage of future gradients in momentum-based stochastic optimization [1.4303041760959478]
Overshoot is a momentum-based descent optimization method designed to enhance performance beyond standard and Nesterov's momentum.
Overshoot consistently outperforms both standard and Nesterov's momentum across a wide range of tasks.
arXiv Detail & Related papers (2025-01-16T14:18:10Z) - Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Ordered Momentum for Asynchronous SGD [12.810976838406193]
We propose a novel method called momentum (OrMo) for ASGD.
In OrMo, momentum is incorporated into ASGD by organizing the gradients in order based on their indexes.
Empirical results demonstrate that OrMo can achieve better convergence performance compared with ASGD.
arXiv Detail & Related papers (2024-07-27T11:35:19Z) - PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Latent State Models of Training Dynamics [51.88132043461152]
We train models with different random seeds and compute a variety of metrics throughout training.
We then fit a hidden Markov model (HMM) over the resulting sequences of metrics.
We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
arXiv Detail & Related papers (2023-08-18T13:20:08Z) - MoMo: Momentum Models for Adaptive Learning Rates [14.392926033512069]
We develop new Polyak-type adaptive learning rates that can be used on top of any momentum method.
We first develop MoMo, a Momentum Model based adaptive learning rate for SGD-M.
We show how MoMo can be used in combination with any momentum-based method, and showcase this by developing MoMo-Adam.
arXiv Detail & Related papers (2023-05-12T16:25:57Z) - Losing momentum in continuous-time stochastic optimisation [42.617042045455506]
momentum-based optimisation algorithms have become particularly widespread.
In this work, we analyse a continuous-time model for gradient descent with momentum.
We also train a convolutional neural network in an image classification problem.
arXiv Detail & Related papers (2022-09-08T10:46:05Z) - Accelerate Distributed Stochastic Descent for Nonconvex Optimization
with Momentum [12.324457683544132]
We propose a momentum method for such model averaging approaches.
We analyze the convergence and scaling properties of such momentum methods.
Our experimental results show that block momentum not only accelerates training, but also achieves better results.
arXiv Detail & Related papers (2021-10-01T19:23:18Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Scaling Hidden Markov Language Models [118.55908381553056]
This work revisits the challenge of scaling HMMs to language modeling datasets.
We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization.
arXiv Detail & Related papers (2020-11-09T18:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.