Related papers: Towards General-Purpose Model-Free Reinforcement Learning

Towards General-Purpose Model-Free Reinforcement Learning

URL: http://arxiv.org/abs/2501.16142v1
Date: Mon, 27 Jan 2025 15:36:37 GMT
Title: Towards General-Purpose Model-Free Reinforcement Learning
Authors: Scott Fujimoto, Pierluca D'Oro, Amy Zhang, Yuandong Tian, Michael Rabbat,
Abstract summary: Reinforcement learning (RL) promises a framework for near-universal problem-solving.<n>In practice, RL algorithms are often tailored to specific benchmarks.<n>We propose a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings.
Score: 40.973429772093155
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

Related papers

Statistical and Algorithmic Foundations of Reinforcement Learning [45.707617428078585]
sequential learning (RL) has received a flurry of attention in recent years.<n>We aim to introduce several important developments in RL, highlighting the connections between new ideas classical topics.
arXiv Detail & Related papers (2025-07-19T02:42:41Z)
DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management [18.953750405635393]
Decoupled Group Reward Optimization (DGRO) is a general RL algorithm for Large Language Models (LLMs) reasoning.<n>We show that DGRO achieves state-of-the-art performance on the Logic dataset with an average accuracy of 96.9%, and demonstrates strong generalization across mathematical benchmarks.
arXiv Detail & Related papers (2025-05-19T10:44:49Z)
Model-Free Robust Reinforcement Learning with Sample Complexity Analysis [16.477827600825428]
This paper proposes a model-free DR-RL algorithm leveraging the Multi-level Monte Carlo technique. We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence. Remarkably, our algorithms represent the first model-free DR-RL approach featuring finite sample complexity.
arXiv Detail & Related papers (2024-06-24T19:35:26Z)
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z)
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX. textttMEX integrates estimation and planning components while balancing exploration exploitation automatically. It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning. We propose a novel estimation function with decomposable structural properties for optimization-based exploration. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z)
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z)
Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL [12.135280422000635]
We introduce a new efficient hierarchical approach for optimizing both continuous and categorical variables. We show that explicitly modelling dependence between data augmentation and other hyper parameters improves generalization.
arXiv Detail & Related papers (2021-06-30T08:15:59Z)
On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning [24.264618706734012]
We show how to develop novel and more effective deep reinforcement learning algorithms. We focus on offline RL and finetuning as case studies. We introduce Distillation of a Mixture of Experts (DiME) We demonstrate that for offline RL, DiME leads to a simple new algorithm that outperforms state-of-the-art.
arXiv Detail & Related papers (2021-06-15T14:59:14Z)
Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models. We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z)
Ordering-Based Causal Discovery with Reinforcement Learning [31.358145789333825]
We propose a novel RL-based approach for causal discovery, by incorporating RL into the ordering-based paradigm. We analyze the consistency and computational complexity of the proposed method, and empirically show that a pretrained model can be exploited to accelerate training.
arXiv Detail & Related papers (2021-05-14T03:49:59Z)
Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing [2.5137859989323537]
A large number of policy networks are generated by randomly guessing their parameters, and then evaluated on the benchmark task. We show that this approach isolates the environment complexity, highlights specific types of challenges, and provides a proper foundation for the statistical analysis of the task's difficulty. We test our approach on a variety of classic control benchmarks from the OpenAI Gym, where we show that small untrained networks can provide a robust baseline for a variety of tasks.
arXiv Detail & Related papers (2020-04-16T15:32:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.