Related papers: Recursive Learning of Asymptotic Variational Objectives

Recursive Learning of Asymptotic Variational Objectives

URL: http://arxiv.org/abs/2411.02217v1
Date: Mon, 04 Nov 2024 16:12:37 GMT
Title: Recursive Learning of Asymptotic Variational Objectives
Authors: Alessandro Mastrototaro, Mathias Müller, Jimmy Olsson,
Abstract summary: General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. Online sequential IWAE (OSIWAE) allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. This approach is more theoretically well-founded than recently proposed online variational SMC methods.
Score: 49.69399307452126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. SSMs, comprising latent Markovian states, can be subjected to variational inference (VI), but standard VI methods like the importance-weighted autoencoder (IWAE) lack functionality for streaming data. To enable online VI in SSMs when the observations are received in real time, we propose maximising an IWAE-type variational lower bound on the asymptotic contrast function, rather than the standard IWAE ELBO, using stochastic approximation. Unlike the recursive maximum likelihood method, which directly maximises the asymptotic contrast, our approach, called online sequential IWAE (OSIWAE), allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. By approximating filter state posteriors and their derivatives using sequential Monte Carlo (SMC) methods, we create a particle-based framework for online VI in SSMs. This approach is more theoretically well-founded than recently proposed online variational SMC methods. We provide rigorous theoretical results on the learning objective and a numerical study demonstrating the method's efficiency in learning model parameters and particle proposal kernels.

Related papers

On the Relation of State Space Models and Hidden Markov Models [0.07646713951724009]
State Space Models (SSMs) and Hidden Markov Models (HMMs) are foundational frameworks for modeling sequential data with latent variables.<n>Recent deterministic state space models have re-emerged in natural language processing through architectures such as S4 and Mamba.
arXiv Detail & Related papers (2026-01-19T19:51:05Z)
Nonparametric learning of covariate-based Markov jump processes using RKHS techniques [3.3005714301829148]
We propose a novel nonparametric approach for linking co variables to Continuous Time Markov Chains (CTMCs)<n>CTMCs provide a robust framework for modeling transitions across clinical or behavioral states.<n>We use a generalized Representer Theorem to enable tractable inference in functional space.
arXiv Detail & Related papers (2025-05-06T02:26:02Z)
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z)
Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization [72.81319836138347]
Online continual learning (OCL) seeks to learn new tasks from data streams that appear only once, while retaining knowledge of previously learned tasks. Most existing methods rely on replay, focusing on enhancing memory retention through regularization or distillation. We introduce a plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability.
arXiv Detail & Related papers (2024-12-24T05:25:21Z)
Deep Learning-based Approaches for State Space Models: A Selective Review [15.295157876811066]
State-space models (SSMs) offer a powerful framework for dynamical system analysis. This paper provides a selective review of recent advancements in deep neural network-based approaches for SSMs.
arXiv Detail & Related papers (2024-12-15T15:04:35Z)
Classification of High-dimensional Time Series in Spectral Domain using Explainable Features [8.656881800897661]
We propose a model-based approach for classifying high-dimensional stationary time series. Our approach emphasizes the interpretability of model parameters, making it especially suitable for fields like neuroscience. The novelty of our method lies in the interpretability of the model parameters, addressing critical needs in neuroscience.
arXiv Detail & Related papers (2024-08-15T19:10:12Z)
Theoretical Foundations of Deep Selective State-Space Models [13.971499161967083]
Deep SSMs demonstrate outstanding performance across a diverse set of domains. Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states. We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
arXiv Detail & Related papers (2024-02-29T11:20:16Z)
Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference. Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks. Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients. We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
Bayesian Active Learning for Discrete Latent Variable Models [19.852463786440122]
Active learning seeks to reduce the amount of data required to fit the parameters of a model. latent variable models play a vital role in neuroscience, psychology, and a variety of other engineering and scientific disciplines.
arXiv Detail & Related papers (2022-02-27T19:07:12Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition. MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method. The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z)
Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed. We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.