Offline Imitation Learning by Controlling the Effective Planning Horizon
- URL: http://arxiv.org/abs/2401.09728v1
- Date: Thu, 18 Jan 2024 05:17:30 GMT
- Title: Offline Imitation Learning by Controlling the Effective Planning Horizon
- Authors: Hee-Jun Ahn, Seong-Woong Shim, Byung-Jun Lee
- Abstract summary: We investigate the effect of controlling the effective planning horizon as opposed to imposing an explicit regularizer.
We show that the corrected algorithm improves on popular imitation learning benchmarks by controlling the effective planning horizon rather than an explicit regularization.
- Score: 5.844892266835562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In offline imitation learning (IL), we generally assume only a handful of
expert trajectories and a supplementary offline dataset from suboptimal
behaviors to learn the expert policy. While it is now common to minimize the
divergence between state-action visitation distributions so that the agent also
considers the future consequences of an action, a sampling error in an offline
dataset may lead to erroneous estimates of state-action visitations in the
offline case. In this paper, we investigate the effect of controlling the
effective planning horizon (i.e., reducing the discount factor) as opposed to
imposing an explicit regularizer, as previously studied. Unfortunately, it
turns out that the existing algorithms suffer from magnified approximation
errors when the effective planning horizon is shortened, which results in a
significant degradation in performance. We analyze the main cause of the
problem and provide the right remedies to correct the algorithm. We show that
the corrected algorithm improves on popular imitation learning benchmarks by
controlling the effective planning horizon rather than an explicit
regularization.
Related papers
- Validity Learning on Failures: Mitigating the Distribution Shift in Autonomous Vehicle Planning [2.3558144417896583]
The planning problem constitutes a fundamental aspect of the autonomous driving framework.
We propose Validity Learning on Failures, VL(on failure) as a remedy to address this issue.
We show that VL(on failure) outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2024-06-03T17:25:18Z) - Disparate Impact on Group Accuracy of Linearization for Private Inference [48.27026603581436]
We show that reducing the number of ReLU activations disproportionately decreases the accuracy for minority groups compared to majority groups.
We also show how a simple procedure altering the fine-tuning step for linearized models can serve as an effective mitigation strategy.
arXiv Detail & Related papers (2024-02-06T01:56:29Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems.
Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation.
We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z) - Interpolation-based Contrastive Learning for Few-Label Semi-Supervised
Learning [43.51182049644767]
Semi-supervised learning (SSL) has long been proved to be an effective technique to construct powerful models with limited labels.
Regularization-based methods which force the perturbed samples to have similar predictions with the original ones have attracted much attention.
We propose a novel contrastive loss to guide the embedding of the learned network to change linearly between samples.
arXiv Detail & Related papers (2022-02-24T06:00:05Z) - Improving the Efficiency of Off-Policy Reinforcement Learning by
Accounting for Past Decisions [20.531576904743282]
Off-policy estimation bias is corrected in a per-decision manner.
Off-policy algorithms such as Tree Backup and Retrace rely on this mechanism.
We propose a multistep operator that permits arbitrary past-dependent traces.
arXiv Detail & Related papers (2021-12-23T00:07:28Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Excursion Search for Constrained Bayesian Optimization under a Limited
Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures.
Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.