Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations
- URL: http://arxiv.org/abs/2511.10571v1
- Date: Fri, 14 Nov 2025 01:58:17 GMT
- Title: Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations
- Authors: Reginald Zhiyan Chen, Heng-Sheng Chang, Prashant G. Mehta,
- Abstract summary: This work introduces Belief Net, a novel framework that learns Hidden Markov Models through gradient-based optimization.<n>Unlike black-box Transformer models, Belief Net's learnable weights are explicitly the logits of the initial distribution, transition matrix, and emission matrix.<n>On synthetic HMM data, Belief Net achieves superior convergence speed compared to Baum-Welch, successfully recovering parameters in both undercomplete and overcomplete settings.
- Score: 0.5161531917413708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hidden Markov Models (HMMs) are fundamental for modeling sequential data, yet learning their parameters from observations remains challenging. Classical methods like the Baum-Welch (EM) algorithm are computationally intensive and prone to local optima, while modern spectral algorithms offer provable guarantees but may produce probability outputs outside valid ranges. This work introduces Belief Net, a novel framework that learns HMM parameters through gradient-based optimization by formulating the HMM's forward filter as a structured neural network. Unlike black-box Transformer models, Belief Net's learnable weights are explicitly the logits of the initial distribution, transition matrix, and emission matrix, ensuring full interpretability. The model processes observation sequences using a decoder-only architecture and is trained end-to-end with standard autoregressive next-observation prediction loss. On synthetic HMM data, Belief Net achieves superior convergence speed compared to Baum-Welch, successfully recovering parameters in both undercomplete and overcomplete settings where spectral methods fail. Comparisons with Transformer-based models are also presented on real-world language data.
Related papers
- Deep unfolding of MCMC kernels: scalable, modular & explainable GANs for high-dimensional posterior sampling [1.930761833716203]
We introduce a novel approach to GAN architecture design by applying deep unfolding to Langevin MCMC algorithms.<n>This paradigm maps fixed-step iterative algorithms onto modular neural networks, yielding architectures that are both flexible and amenable to interpretation.<n>We train these unfolded samplers end-to-end using a supervised regularized Wasserstein GAN framework for posterior sampling.
arXiv Detail & Related papers (2026-02-24T10:37:10Z) - Cluster-Based Generalized Additive Models Informed by Random Fourier Features [19.409397281817288]
This work introduces a mixture of generalized additive models (GAMs) in which random Fourier feature (RFF) representations are leveraged to uncover locally adaptive structure in the data.<n> Numerical experiments on real-world regression benchmarks, including the California Housing, NASA Air Self-Noise, and Bike Sharing datasets, demonstrate improved predictive performance.
arXiv Detail & Related papers (2025-12-22T13:15:52Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.<n>By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z) - Recursive Learning of Asymptotic Variational Objectives [49.69399307452126]
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data.
Online sequential IWAE (OSIWAE) allows for online learning of both model parameters and a Markovian recognition model for inferring latent states.
This approach is more theoretically well-founded than recently proposed online variational SMC methods.
arXiv Detail & Related papers (2024-11-04T16:12:37Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Normalizing Flow based Hidden Markov Models for Classification of Speech
Phones with Explainability [25.543231171094384]
In pursuit of explainability, we develop generative models for sequential data.
We combine modern neural networks (normalizing flows) and traditional generative models (hidden Markov models - HMMs)
The proposed generative models can compute likelihood of a data and hence directly suitable for maximum-likelihood (ML) classification approach.
arXiv Detail & Related papers (2021-07-01T20:10:55Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - DenseHMM: Learning Hidden Markov Models by Learning Dense
Representations [0.0]
We propose a modification of Hidden Markov Models (HMMs) that allows to learn dense representations of both the hidden states and the observables.
Compared to the standard HMM, transition probabilities are not atomic but composed of these representations via kernelization.
The properties of the DenseHMM like learned co-occurrences and log-likelihoods are studied empirically on synthetic and biomedical datasets.
arXiv Detail & Related papers (2020-12-17T17:48:27Z) - Scaling Hidden Markov Language Models [118.55908381553056]
This work revisits the challenge of scaling HMMs to language modeling datasets.
We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization.
arXiv Detail & Related papers (2020-11-09T18:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.