Horizon Reduction Makes RL Scalable
- URL: http://arxiv.org/abs/2506.04168v2
- Date: Sun, 08 Jun 2025 18:51:31 GMT
- Title: Horizon Reduction Makes RL Scalable
- Authors: Seohong Park, Kevin Frans, Deepinder Mann, Benjamin Eysenbach, Aviral Kumar, Sergey Levine,
- Abstract summary: We study the scalability of offline reinforcement learning (RL) algorithms.<n>We use datasets up to 1000x larger than typical offline RL datasets.<n>We show that horizon is the main cause behind the poor scaling of offline RL.
- Score: 78.67071359991218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, compute, and model capacity. We investigate if and how current offline RL algorithms match up to this promise on diverse, challenging, previously unsolved tasks, using datasets up to 1000x larger than typical offline RL datasets. We observe that despite scaling up data, many existing offline RL algorithms exhibit poor scaling behavior, saturating well below the maximum performance. We hypothesize that the horizon is the main cause behind the poor scaling of offline RL. We empirically verify this hypothesis through several analysis experiments, showing that long horizons indeed present a fundamental barrier to scaling up offline RL. We then show that various horizon reduction techniques substantially enhance scalability on challenging tasks. Based on our insights, we also introduce a minimal yet scalable method named SHARSA that effectively reduces the horizon. SHARSA achieves the best asymptotic performance and scaling behavior among our evaluation methods, showing that explicitly reducing the horizon unlocks the scalability of offline RL. Code: https://github.com/seohongpark/horizon-reduction
Related papers
- Reinforcement Learning with Action Chunking [56.838297900091426]
We present Q-chunking, a recipe for improving reinforcement learning algorithms for long-horizon, sparse-reward tasks.<n>Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning.<n>Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
arXiv Detail & Related papers (2025-07-10T17:48:03Z) - On the Statistical Complexity for Offline and Low-Adaptive Reinforcement Learning with Structures [63.36095790552758]
This article reviews the recent advances on the statistical foundation of reinforcement learning (RL) in the offline and low-adaptive settings.<n>We will start by arguing why offline RL is the appropriate model for almost any real-life ML problems, even if they have nothing to do with the recent AI breakthroughs that use RL.<n>We will zoom into two fundamental problems of offline RL: offline policy evaluation (OPE) and offline policy learning (OPL)
arXiv Detail & Related papers (2025-01-03T20:27:53Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Improving Offline RL by Blending Heuristics [33.810026421228635]
Heuristic Blending improves performance of offline RL algorithms based on value bootstrapping.
HubL consistently improves the policy quality of four state-of-the-art bootstrapping-based offline RL algorithms.
arXiv Detail & Related papers (2023-06-01T03:36:06Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - Single-Shot Pruning for Offline Reinforcement Learning [47.886329599997474]
Deep Reinforcement Learning (RL) is a powerful framework for solving complex real-world problems.
One way to tackle this problem is to prune neural networks leaving only the necessary parameters.
We close the gap between RL and single-shot pruning techniques and present a general pruning approach to the Offline RL.
arXiv Detail & Related papers (2021-12-31T18:10:02Z) - A Minimalist Approach to Offline Reinforcement Learning [10.904148149681932]
offline reinforcement learning defines the task of learning from a fixed batch of data.
In this paper we aim to make a deep RL algorithm work while making minimal changes.
We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm.
arXiv Detail & Related papers (2021-06-12T20:38:59Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.