Related papers: Multi-timestep models for Model-based Reinforcement Learning

Multi-timestep models for Model-based Reinforcement Learning

URL: http://arxiv.org/abs/2310.05672v2
Date: Wed, 11 Oct 2023 08:37:40 GMT
Title: Multi-timestep models for Model-based Reinforcement Learning
Authors: Abdelhakim Benechehab, Giuseppe Paolo, Albert Thomas, Maurizio Filippone, Bal\'azs K\'egl
Abstract summary: In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data. We tackle this issue by using a multi-timestep objective to train one-step models. We find that exponentially decaying weights lead to models that significantly improve the long-horizon R2 score.
Score: 10.940666275830052
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as length of the trajectory grows. In this paper we tackle this issue by using a multi-timestep objective to train one-step models. Our objective is a weighted sum of a loss function (e.g., negative log-likelihood) at various future horizons. We explore and test a range of weights profiles. We find that exponentially decaying weights lead to models that significantly improve the long-horizon R2 score. This improvement is particularly noticeable when the models were evaluated on noisy data. Finally, using a soft actor-critic (SAC) agent in pure batch reinforcement learning (RL) and iterated batch RL scenarios, we found that our multi-timestep models outperform or match standard one-step models. This was especially evident in a noisy variant of the considered environment, highlighting the potential of our approach in real-world applications.

Related papers

Scalable Offline Model-Based RL with Action Chunks [60.80151356018376]
We study whether model-based reinforcement learning can provide a scalable recipe for tackling complex, long-horizon tasks in offline RL.<n>We call this recipe textbfModel-Based RL with Action Chunks (MAC).<n>We show that MAC achieves the best performance among offline model-based RL algorithms, especially on challenging long-horizon tasks.
arXiv Detail & Related papers (2025-12-08T23:26:29Z)
Reasoning with Sampling: Your Base Model is Smarter Than You Think [52.639108524651846]
We propose a simple iterative sampling algorithm leveraging the base models' own likelihoods.<n>We show that our algorithm offers substantial boosts in reasoning that nearly match and even outperform those from RL.<n>Our method does not require training, curated datasets, or a verifier.
arXiv Detail & Related papers (2025-10-16T17:18:11Z)
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models [57.49136894315871]
New paradigm of test-time scaling has yielded remarkable breakthroughs in reasoning models and generative vision models.<n>We propose one solution to the problem of integrating test-time scaling knowledge into a model during post-training.<n>We replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise.
arXiv Detail & Related papers (2025-08-13T17:33:37Z)
Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models. We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints. We empirically find that this training paradigm limits the one-step generation performance of consistency models. We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z)
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient [9.519619751861333]
We propose a state space model (SSM) based world model based on Mamba. It achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies. This model is accessible and can be trained on an off-the-shelf laptop.
arXiv Detail & Related papers (2024-10-11T15:10:40Z)
A Multi-step Loss Function for Robust Learning of the Dynamics in Model-based Reinforcement Learning [10.940666275830052]
In model-based reinforcement learning, most algorithms rely on simulating trajectories from one-step models of the dynamics learned on data. We tackle this issue by using a multi-step objective to train one-step models. We find that this new loss is particularly useful when the data is noisy, which is often the case in real-life environments.
arXiv Detail & Related papers (2024-02-05T16:13:00Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA) Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space. We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies. VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z)
Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose? [0.2836066255205732]
We contribute to micro-data model-based reinforcement learning (MBRL) by rigorously comparing popular generative models. We find that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin. We also found that deterministic models are on par, in fact they consistently (although non-significantly) outperform their probabilistic counterparts.
arXiv Detail & Related papers (2021-07-24T11:38:25Z)
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series. Our model parameterizes mean and variance for each time-stamp with flexible neural networks. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)
Reinforcement Learning based dynamic weighing of Ensemble Models for Time Series Forecasting [0.8399688944263843]
It is known that if models selected for data modelling are distinct (linear/non-linear, static/dynamic) and independent (minimally correlated) models, the accuracy of the predictions is improved. Various approaches suggested in the literature to weigh the ensemble models use a static set of weights. To address this issue, a Reinforcement Learning (RL) approach to dynamically assign and update weights of each of the models at different time instants.
arXiv Detail & Related papers (2020-08-20T10:40:42Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.