Related papers: Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions

Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions

URL: http://arxiv.org/abs/2512.20974v1
Date: Wed, 24 Dec 2025 06:00:51 GMT
Title: Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions
Authors: Jingyang You, Hanna Kurniawati,
Abstract summary: We introduce a novel deep BRL method, Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL)<n>On challenging MetaWorld ML10/45 benchmarks, GLiBRL improves the success rate of one of the state-of-the-art deep BRL methods, VariBAD, by up to 2.7x.
Score: 4.605026772972944
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Bayesian Reinforcement Learning (BRL) provides a framework for generalisation of Reinforcement Learning (RL) problems from its use of Bayesian task parameters in the transition and reward models. However, classical BRL methods assume known forms of transition and reward models, reducing their applicability in real-world problems. As a result, recent deep BRL methods have started to incorporate model learning, though the use of neural networks directly on the joint data and task parameters requires optimising the Evidence Lower Bound (ELBO). ELBOs are difficult to optimise and may result in indistinctive task parameters, hence compromised BRL policies. To this end, we introduce a novel deep BRL method, Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL), that enables efficient and accurate learning of transition and reward models, with fully tractable marginal likelihood and Bayesian inference on task parameters and model noises. On challenging MetaWorld ML10/45 benchmarks, GLiBRL improves the success rate of one of the state-of-the-art deep BRL methods, VariBAD, by up to 2.7x. Comparing against representative or recent deep BRL / Meta-RL methods, such as MAML, RL2, SDVT, TrMRL and ECET, GLiBRL also demonstrates its low-variance and decent performance consistently.

Related papers

SOMBRL: Scalable and Optimistic Model-Based RL [78.3360288726531]
We propose an approach based on the principle of optimism in the face of uncertainty.<n>We show that SOMBRL offers a flexible and scalable solution for principled exploration.<n>We also evaluate SOMBRL on a dynamic RC car hardware and show SOMBRL outperforms the state-of-the-art.
arXiv Detail & Related papers (2025-11-25T08:39:21Z)
Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models [53.339700196282905]
A key challenge in applying reinforcement learning to large language models (dLLMs) is the intractability of their likelihood functions.<n>We propose a memory-efficient RL algorithm that maximizes a specially constructed lower bound of the ELBO-based objective.<n> Experiments show that BGPO significantly outperforms previous RL algorithms for dLLMs in math problem solving, code generation, and planning tasks.
arXiv Detail & Related papers (2025-10-13T17:47:50Z)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z)
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [93.00629872970364]
Reinforcement learning (RL) has become the dominant paradigm for improving the performance of language models on complex reasoning tasks.<n>We introduce SPARKLE, a fine-grained analytic framework to dissect the effects of RL across three key dimensions.<n>We study whether difficult problems -- those yielding no RL signals and mixed-quality reasoning traces -- can still be effectively used for training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z)
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models.<n>We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
Enhancing Offline Model-Based RL via Active Model Selection: A Bayesian Optimization Perspective [11.20804263996665]
offline model-based reinforcement learning (MBRL) serves as a competitive framework that can learn well-performing policies solely from pre-collected data.<n>We propose BOMS, an active model selection framework that enhances model selection in offline MBRL with only a small online interaction budget.<n>We show that BOMS improves over the baseline methods with a small amount of online interaction comparable to only $1%$-$2.5%$ of offline training data.
arXiv Detail & Related papers (2025-02-17T06:34:58Z)
Model-Based Transfer Learning for Contextual Reinforcement Learning [5.5597941107270215]
We introduce Model-Based Transfer Learning to solve contextual RL problems.<n>We show theoretically that the method exhibits sublinear regret in the number of training tasks.<n>We experimentally validate our methods using urban traffic and standard continuous control benchmarks.
arXiv Detail & Related papers (2024-08-08T14:46:01Z)
Mind the Model, Not the Agent: The Primacy Bias in Model-based RL [27.41568030157649]
primacy bias in model-free reinforcement learning (MFRL) refers to the agent's tendency to overfit early data and lose the ability to learn from new data. Previous studies have shown that employing simple techniques, such as resetting the agent's parameters, can substantially alleviate the primacy bias in MFRL. In this work, we focus on investigating the primacy bias in model-based reinforcement learning (MBRL)
arXiv Detail & Related papers (2023-10-23T15:12:20Z)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task via traditional RL, in the inputs to Meta-RL.<n>We show that RL$3$ earns a greater cumulative reward in the long term compared to RL$2$ while drastically reducing meta-training time and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z)
A Survey on Model-based Reinforcement Learning [21.85904195671014]
Reinforcement learning (RL) solves sequential decision-making problems via a trial-and-error process interacting with the environment. Model-based reinforcement learning (MBRL) is believed to be a promising direction, which builds environment models in which the trial-and-errors can take place without real costs.
arXiv Detail & Related papers (2022-06-19T05:28:03Z)
MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. We present MOReL, an algorithmic framework for model-based offline RL. We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.