Related papers: Online reinforcement learning via sparse Gaussian mixture model Q-functions

Online reinforcement learning via sparse Gaussian mixture model Q-functions

URL: http://arxiv.org/abs/2509.14585v1
Date: Thu, 18 Sep 2025 03:37:11 GMT
Title: Online reinforcement learning via sparse Gaussian mixture model Q-functions
Authors: Minh Vu, Konstantinos Slavakis,
Abstract summary: This paper introduces a structured and interpretable online policy-iteration framework for reinforcement learning (RL)<n>It is built around the novel class of sparse Gaussian mixture model Q-functions (S-GMM-QFs)<n> Numerical tests show that S-GMM-QFs match the performance of dense deep RL (DeepRL) methods on standard benchmarks.
Score: 7.056697401102689
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces a structured and interpretable online policy-iteration framework for reinforcement learning (RL), built around the novel class of sparse Gaussian mixture model Q-functions (S-GMM-QFs). Extending earlier work that trained GMM-QFs offline, the proposed framework develops an online scheme that leverages streaming data to encourage exploration. Model complexity is regulated through sparsification by Hadamard overparametrization, which mitigates overfitting while preserving expressiveness. The parameter space of S-GMM-QFs is naturally endowed with a Riemannian manifold structure, allowing for principled parameter updates via online gradient descent on a smooth objective. Numerical tests show that S-GMM-QFs match the performance of dense deep RL (DeepRL) methods on standard benchmarks while using significantly fewer parameters, and maintain strong performance even in low-parameter-count regimes where sparsified DeepRL methods fail to generalize.

Related papers

Spectral Gating Networks [65.9496901693099]
We introduce Spectral Gating Networks (SGN) to introduce frequency-rich expressivity in feed-forward networks.<n>SGN augments a standard activation pathway with a compact spectral pathway and learnable gates that allow the model to start from a stable base behavior.<n>It consistently improves accuracy-efficiency trade-offs under comparable computational budgets.
arXiv Detail & Related papers (2026-02-07T20:00:49Z)
Gaussian-Mixture-Model Q-Functions for Policy Iteration in Reinforcement Learning [7.056697401102689]
This paper introduces a novel function-approximation role for Gaussian mixture models (GMMs) as direct surrogates for Q-function losses.<n>These parametric models, termed GMM-QFs, possess substantial representational capacity.<n>They are shown to be universal approximators over a broad class of functions.
arXiv Detail & Related papers (2025-12-21T15:00:32Z)
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective [85.06838178922791]
Reinforcement Learning (RL) has proven highly effective for autoregressive language models.<n>But adapting these methods to diffusion large language models (dLLMs) presents fundamental challenges.<n>We propose a principled RL framework that treats entire sequence generation as a single action and uses the ELBO as a tractable sequence-level likelihood proxy.
arXiv Detail & Related papers (2025-12-03T13:05:32Z)
Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations [0.5161531917413708]
This work introduces Belief Net, a novel framework that learns Hidden Markov Models through gradient-based optimization.<n>Unlike black-box Transformer models, Belief Net's learnable weights are explicitly the logits of the initial distribution, transition matrix, and emission matrix.<n>On synthetic HMM data, Belief Net achieves superior convergence speed compared to Baum-Welch, successfully recovering parameters in both undercomplete and overcomplete settings.
arXiv Detail & Related papers (2025-11-13T18:08:19Z)
Deep Equilibrium models for Poisson Imaging Inverse problems via Mirror Descent [7.248102801711294]
Deep Equilibrium Models (DEQs) are implicit neural networks with fixed points.<n>We introduce a novel DEQ formulation based on Mirror Descent defined in terms of a tailored non-Euclidean geometry.<n>We propose computational strategies that enable both efficient training and fully parameter-free inference.
arXiv Detail & Related papers (2025-07-15T16:33:01Z)
Beyond Linearity: Squeeze-and-Recalibrate Blocks for Few-Shot Whole Slide Image Classification [35.6247241174615]
We propose a Squeeze-and-Recalibrate (SR) block, a drop-in replacement for linear layers in deep learning models.<n>We provide theoretical guarantees that the SR block can approximate any linear mapping to arbitrary precision.<n>Our SR-MIL models consistently outperform prior methods while requiring significantly fewer parameters and no architectural changes.
arXiv Detail & Related papers (2025-05-21T13:24:47Z)
Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.<n>By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
AutoTurb: Using Large Language Models for Automatic Algebraic Model Discovery of Turbulence Closure [15.905369652489505]
In this work, a novel framework using LLMs to automatically discover expressions for correcting the Reynolds stress model is proposed. The proposed method is performed for separated flow over periodic hills at Re = 10,595. It is demonstrated that the corrective RANS can improve the prediction for both the Reynolds stress and mean velocity fields.
arXiv Detail & Related papers (2024-10-14T16:06:35Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization [4.192712667327955]
This paper establishes a novel role for Gaussian-mixture models (GMMs) as functional approximators of Q-function losses in reinforcement learning (RL)
arXiv Detail & Related papers (2024-09-06T16:13:04Z)
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO) We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z)
Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [73.80728148866906]
Quasimetric Reinforcement Learning (QRL) is a new RL method that utilizes quasimetric models to learn optimal value functions. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance.
arXiv Detail & Related papers (2023-04-03T17:59:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.