Related papers: Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning

Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2201.05433v1
Date: Fri, 14 Jan 2022 13:08:19 GMT
Title: Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
Authors: Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler
Abstract summary: We compare model-free, model-based, as well as hybrid offline RL approaches on various industrial benchmark (IB) datasets. We find that on the IB, hybrid approaches face severe difficulties and that simpler algorithms, such as rollout based algorithms or model-free algorithms with simpler regularizers perform best.
Score: 3.1848563608930505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline reinforcement learning (RL) Algorithms are often designed with environments such as MuJoCo in mind, in which the planning horizon is extremely long and no noise exists. We compare model-free, model-based, as well as hybrid offline RL approaches on various industrial benchmark (IB) datasets to test the algorithms in settings closer to real world problems, including complex noise and partially observable states. We find that on the IB, hybrid approaches face severe difficulties and that simpler algorithms, such as rollout based algorithms or model-free algorithms with simpler regularizers perform best on the datasets.

Related papers

A Clean Slate for Offline Reinforcement Learning [30.87055102715522]
offline reinforcement learning (RL) has been impeded by ambiguous problem definitions and entangled algorithmic designs. We introduce a rigorous taxonomy and a transparent evaluation protocol that explicitly quantifies online tuning budgets. We develop two novel algorithms - TD3-AWR (model-free) and MoBRAC (model-based) - which substantially outperform established baselines.
arXiv Detail & Related papers (2025-04-15T17:59:05Z)
RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models. AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model. We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z)
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds [59.875550175217874]
We show that a simple Model-based Reinforcement Learning scheme achieves strong regret and sample bounds in online and offline RL settings. We highlight that our algorithms are simple, fairly standard, and indeed have been extensively studied in the RL literature.
arXiv Detail & Related papers (2024-08-16T19:52:53Z)
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond [29.449446595110643]
We propose a notion of data diversity that subsumes the previous notions of coverage measures in offline RL. Our proposed model-free PS-based algorithm for offline RL is novel, with sub-optimality bounds that are frequentist (i.e., worst-case) in nature.
arXiv Detail & Related papers (2024-01-06T20:52:04Z)
A Comparative Study of Deep Learning and Iterative Algorithms for Joint Channel Estimation and Signal Detection in OFDM Systems [11.190815358585137]
Joint channel estimation and signal detection is crucial in frequency division multiplexing systems. Traditional algorithms perform poorly in low signal-to-noise ratio (SNR) scenarios. Deep learning (DL) methods have been investigated, but concerns regarding computational expense and lack of validation in low-SNR settings remain.
arXiv Detail & Related papers (2023-03-07T06:34:04Z)
A Strong Baseline for Batch Imitation Learning [25.392006064406967]
We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern.
arXiv Detail & Related papers (2023-02-06T14:03:33Z)
Single-Trajectory Distributionally Robust Reinforcement Learning [21.955807398493334]
We propose Distributionally Robust RL (DRRL) to enhance performance across a range of environments. Existing DRRL algorithms are either model-based or fail to learn from a single sample trajectory. We design a first fully model-free DRRL algorithm, called distributionally robust Q-learning with single trajectory (DRQ)
arXiv Detail & Related papers (2023-01-27T14:08:09Z)
Oracle Inequalities for Model Selection in Offline Reinforcement Learning [105.74139523696284]
We study the problem of model selection in offline RL with value function approximation. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors. We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
arXiv Detail & Related papers (2022-11-03T17:32:34Z)
COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable. We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions. We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
A Hybrid PAC Reinforcement Learning Algorithm [5.279475826661642]
This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases.
arXiv Detail & Related papers (2020-09-05T21:32:42Z)
Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms. Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature. It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.