Related papers: On the Relation of State Space Models and Hidden Markov Models

On the Relation of State Space Models and Hidden Markov Models

URL: http://arxiv.org/abs/2601.13357v1
Date: Mon, 19 Jan 2026 19:51:05 GMT
Title: On the Relation of State Space Models and Hidden Markov Models
Authors: Aydin Ghojogh, M. Hadi Sepanj, Benyamin Ghojogh,
Abstract summary: State Space Models (SSMs) and Hidden Markov Models (HMMs) are foundational frameworks for modeling sequential data with latent variables.<n>Recent deterministic state space models have re-emerged in natural language processing through architectures such as S4 and Mamba.
Score: 0.07646713951724009
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State Space Models (SSMs) and Hidden Markov Models (HMMs) are foundational frameworks for modeling sequential data with latent variables and are widely used in signal processing, control theory, and machine learning. Despite their shared temporal structure, they differ fundamentally in the nature of their latent states, probabilistic assumptions, inference procedures, and training paradigms. Recently, deterministic state space models have re-emerged in natural language processing through architectures such as S4 and Mamba, raising new questions about the relationship between classical probabilistic SSMs, HMMs, and modern neural sequence models. In this paper, we present a unified and systematic comparison of HMMs, linear Gaussian state space models, Kalman filtering, and contemporary NLP state space models. We analyze their formulations through the lens of probabilistic graphical models, examine their inference algorithms -- including forward-backward inference and Kalman filtering -- and contrast their learning procedures via Expectation-Maximization and gradient-based optimization. By highlighting both structural similarities and semantic differences, we clarify when these models are equivalent, when they fundamentally diverge, and how modern NLP SSMs relate to classical probabilistic models. Our analysis bridges perspectives from control theory, probabilistic modeling, and modern deep learning.

Related papers

From Classical Probabilistic Latent Variable Models to Modern Generative AI: A Unified Perspective [1.482314366716538]
Generative Artificial Intelligence (AI) now underpins state-of-the-art systems.<n>Despite their varied architectures, many share a common foundation in probabilistic latent variable models (PLVMs)<n>This paper presents a unified perspective by framing both classical and modern generative methods within the PLVM paradigm.
arXiv Detail & Related papers (2025-08-18T11:02:32Z)
Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z)
Deep Learning-based Approaches for State Space Models: A Selective Review [15.295157876811066]
State-space models (SSMs) offer a powerful framework for dynamical system analysis.<n>This paper provides a selective review of recent advancements in deep neural network-based approaches for SSMs.
arXiv Detail & Related papers (2024-12-15T15:04:35Z)
Recursive Learning of Asymptotic Variational Objectives [49.69399307452126]
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. Online sequential IWAE (OSIWAE) allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. This approach is more theoretically well-founded than recently proposed online variational SMC methods.
arXiv Detail & Related papers (2024-11-04T16:12:37Z)
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks [50.29356570858905]
We introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation.<n>We provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated.<n>This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.
arXiv Detail & Related papers (2024-05-24T17:19:57Z)
eXponential FAmily Dynamical Systems (XFADS): Large-scale nonlinear Gaussian state-space modeling [9.52474299688276]
We introduce a low-rank structured variational autoencoder framework for nonlinear state-space graphical models. We show that our approach consistently demonstrates the ability to learn a more predictive generative model.
arXiv Detail & Related papers (2024-03-03T02:19:49Z)
Distributed Bayesian Learning of Dynamic States [65.7870637855531]
The proposed algorithm is a distributed Bayesian filtering task for finite-state hidden Markov models. It can be used for sequential state estimation, as well as for modeling opinion formation over social networks under dynamic environments.
arXiv Detail & Related papers (2022-12-05T19:40:17Z)
On the Influence of Enforcing Model Identifiability on Learning dynamics of Gaussian Mixture Models [14.759688428864159]
We propose a technique for extracting submodels from singular models. Our method enforces model identifiability during training. We show how the method can be applied to more complex models like deep neural networks.
arXiv Detail & Related papers (2022-06-17T07:50:22Z)
Surrogate Modeling for Physical Systems with Preserved Properties and Adjustable Tradeoffs [0.0]
We present a model-based and a data-driven strategy to generate surrogate models. The latter generates interpretable surrogate models by fitting artificial relations to a presupposed topological structure. Our framework is compatible with various spatial discretization schemes for distributed parameter models.
arXiv Detail & Related papers (2022-02-02T17:07:02Z)
Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.