Related papers: Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning

Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2508.19900v1
Date: Wed, 27 Aug 2025 14:00:18 GMT
Title: Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning
Authors: Tan Jing, Xiaorui Li, Chao Yao, Xiaojuan Ban, Yuetong Fang, Renjing Xu, Zhaolin Yuan,
Abstract summary: offline reinforcement learning (RL) enables learning effective policies from fixed datasets without any environment interaction.<n>Existing methods typically employ policy constraints to mitigate the distribution shift encountered during offline RL training.<n>We propose Adaptive Scaling of Policy Constraints (ASPC), a second-order differentiable framework that dynamically balances RL and behavior cloning (BC) during training.
Score: 24.46783760408068
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Offline reinforcement learning (RL) enables learning effective policies from fixed datasets without any environment interaction. Existing methods typically employ policy constraints to mitigate the distribution shift encountered during offline RL training. However, because the scale of the constraints varies across tasks and datasets of differing quality, existing methods must meticulously tune hyperparameters to match each dataset, which is time-consuming and often impractical. We propose Adaptive Scaling of Policy Constraints (ASPC), a second-order differentiable framework that dynamically balances RL and behavior cloning (BC) during training. We theoretically analyze its performance improvement guarantee. In experiments on 39 datasets across four D4RL domains, ASPC using a single hyperparameter configuration outperforms other adaptive constraint methods and state-of-the-art offline RL algorithms that require per-dataset tuning while incurring only minimal computational overhead. The code will be released at https://github.com/Colin-Jing/ASPC.

Related papers

General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies [4.098989232625628]
offline RL algorithms aim to improve upon the behavior policy that produces the collected data while constraining the learned policy to be within the support of the dataset.<n>We introduce the general flexible function formulation for the $f$-divergence to incorporate an adaptive constraint on algorithms' learning objectives based on the offline training dataset.<n>Results from experiments on the MuJoCo, Fetch, and AdroitHand environments show the correctness of the proposed LP form and the potential of the flexible $f$-divergence in improving performance for learning from a challenging dataset when applied to a compatible constrained optimization algorithm.
arXiv Detail & Related papers (2026-02-11T17:53:49Z)
Evaluation-Time Policy Switching for Offline Reinforcement Learning [5.052293146674794]
offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment.<n>Many off-policy algorithms developed for online learning struggle in the offline setting as they tend to over-estimate the behaviour of out of distributions of actions.<n>Existing offline RL algorithms adapt off-policy algorithms, employing techniques such as constraining the policy or modifying the value function to achieve good performance on individual datasets.<n>We introduce a policy switching technique that dynamically combines the behaviour of a pure off-policy RL agent, for improving behaviour, and a behavioural cloning (BC) agent, for staying close to the
arXiv Detail & Related papers (2025-03-15T18:12:16Z)
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning [71.02384943570372]
Family Offline-to-Online RL (FamO2O) is a framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances. FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark.
arXiv Detail & Related papers (2023-10-27T08:30:54Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance [74.31779732754697]
We propose a novel plug-in approach named Guided Offline RL (GORL) GORL employs a guiding network, along with only a few expert demonstrations, to adaptively determine the relative importance of the policy improvement and policy constraint for every sample. Experiments on various environments suggest that GORL can be easily installed on most offline RL algorithms with statistically significant performance improvements.
arXiv Detail & Related papers (2023-09-04T08:59:04Z)
Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability. Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z)
Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged. We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z)
Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR) We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.