Reframing Offline Reinforcement Learning as a Regression Problem
- URL: http://arxiv.org/abs/2401.11630v1
- Date: Sun, 21 Jan 2024 23:50:46 GMT
- Title: Reframing Offline Reinforcement Learning as a Regression Problem
- Authors: Prajwal Koirala and Cody Fleming
- Abstract summary: The study proposes the reformulation of offline reinforcement learning as a regression problem that can be solved with decision trees.
We observe that with gradient-boosted trees, the agent training and inference are very fast.
Despite the simplification inherent in this reformulated problem, our agent demonstrates performance that is at least on par with established methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study proposes the reformulation of offline reinforcement learning as a
regression problem that can be solved with decision trees. Aiming to predict
actions based on input states, return-to-go (RTG), and timestep information, we
observe that with gradient-boosted trees, the agent training and inference are
very fast, the former taking less than a minute. Despite the simplification
inherent in this reformulated problem, our agent demonstrates performance that
is at least on par with established methods. This assertion is validated by
testing it across standard datasets associated with D4RL Gym-MuJoCo tasks. We
further discuss the agent's ability to generalize by testing it on two extreme
cases, how it learns to model the return distributions effectively even with
highly skewed expert datasets, and how it exhibits robust performance in
scenarios with sparse/delayed rewards.
Related papers
- A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning [37.62558445850573]
We propose an algorithm, iterative influence-based filtering (IIF), for online RL training.<n>IIF reduces sample complexity, speeds up training, and achieves higher returns.<n>These results advance interpretability, efficiency, and effectiveness of online RL.
arXiv Detail & Related papers (2025-05-25T19:25:57Z) - What Matters for Batch Online Reinforcement Learning in Robotics? [65.06558240091758]
The ability to learn from large batches of autonomously collected data for policy improvement holds the promise of enabling truly scalable robot learning.<n>Previous works have applied imitation learning and filtered imitation learning methods to the batch online RL problem.<n>We analyze how these axes affect performance and scaling with the amount of autonomous data.
arXiv Detail & Related papers (2025-05-12T21:24:22Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Explaining RL Decisions with Trajectories [28.261758841898697]
Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems.
We propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL agent to the trajectories encountered by it during training.
arXiv Detail & Related papers (2023-05-06T15:26:22Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.