Reframing Offline Reinforcement Learning as a Regression Problem
- URL: http://arxiv.org/abs/2401.11630v1
- Date: Sun, 21 Jan 2024 23:50:46 GMT
- Title: Reframing Offline Reinforcement Learning as a Regression Problem
- Authors: Prajwal Koirala and Cody Fleming
- Abstract summary: The study proposes the reformulation of offline reinforcement learning as a regression problem that can be solved with decision trees.
We observe that with gradient-boosted trees, the agent training and inference are very fast.
Despite the simplification inherent in this reformulated problem, our agent demonstrates performance that is at least on par with established methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study proposes the reformulation of offline reinforcement learning as a
regression problem that can be solved with decision trees. Aiming to predict
actions based on input states, return-to-go (RTG), and timestep information, we
observe that with gradient-boosted trees, the agent training and inference are
very fast, the former taking less than a minute. Despite the simplification
inherent in this reformulated problem, our agent demonstrates performance that
is at least on par with established methods. This assertion is validated by
testing it across standard datasets associated with D4RL Gym-MuJoCo tasks. We
further discuss the agent's ability to generalize by testing it on two extreme
cases, how it learns to model the return distributions effectively even with
highly skewed expert datasets, and how it exhibits robust performance in
scenarios with sparse/delayed rewards.
Related papers
- Robust Capped lp-Norm Support Vector Ordinal Regression [85.84718111830752]
Ordinal regression is a specialized supervised problem where the labels show an inherent order.
Support Vector Ordinal Regression, as an outstanding ordinal regression model, is widely used in many ordinal regression tasks.
We introduce a new model, Capped $ell_p$-Norm Support Vector Ordinal Regression(CSVOR), that is robust to outliers.
arXiv Detail & Related papers (2024-04-25T13:56:05Z) - Dissecting Deep RL with High Update Ratios: Combatting Value Divergence [21.282292112642747]
We show that deep reinforcement learning algorithms can retain their ability to learn without resetting network parameters.
We employ a simple unit-ball normalization that enables learning under large update ratios.
arXiv Detail & Related papers (2024-03-09T19:56:40Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Streaming Active Learning for Regression Problems Using Regression via
Classification [12.572218568705376]
We propose to use the regression-via-classification framework for streaming active learning for regression.
Regression-via-classification transforms regression problems into classification problems so that streaming active learning methods can be applied directly to regression problems.
arXiv Detail & Related papers (2023-09-02T20:24:24Z) - Alleviating the Effect of Data Imbalance on Adversarial Training [26.36714114672729]
We study adversarial training on datasets that obey the long-tailed distribution.
We propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT)
arXiv Detail & Related papers (2023-07-14T07:01:48Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.