Doubly Mild Generalization for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2411.07934v2
- Date: Wed, 13 Nov 2024 06:34:07 GMT
- Title: Doubly Mild Generalization for Offline Reinforcement Learning
- Authors: Yixiu Mao, Qi Wang, Yun Qu, Yuhang Jiang, Xiangyang Ji,
- Abstract summary: We show that mild generalization beyond the dataset can be trusted and leveraged to improve performance under certain conditions.
We propose Doubly Mild Generalization (DMG) comprising (i) mild action generalization and (ii) mild generalization propagation.
DMG achieves state-of-the-art performance across Gym-MuJoCo tasks and challenging AntMaze tasks.
- Score: 50.084440946096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline Reinforcement Learning (RL) suffers from the extrapolation error and value overestimation. From a generalization perspective, this issue can be attributed to the over-generalization of value functions or policies towards out-of-distribution (OOD) actions. Significant efforts have been devoted to mitigating such generalization, and recent in-sample learning approaches have further succeeded in entirely eschewing it. Nevertheless, we show that mild generalization beyond the dataset can be trusted and leveraged to improve performance under certain conditions. To appropriately exploit generalization in offline RL, we propose Doubly Mild Generalization (DMG), comprising (i) mild action generalization and (ii) mild generalization propagation. The former refers to selecting actions in a close neighborhood of the dataset to maximize the Q values. Even so, the potential erroneous generalization can still be propagated, accumulated, and exacerbated by bootstrapping. In light of this, the latter concept is introduced to mitigate the generalization propagation without impeding the propagation of RL learning signals. Theoretically, DMG guarantees better performance than the in-sample optimal policy in the oracle generalization scenario. Even under worst-case generalization, DMG can still control value overestimation at a certain level and lower bound the performance. Empirically, DMG achieves state-of-the-art performance across Gym-MuJoCo locomotion tasks and challenging AntMaze tasks. Moreover, benefiting from its flexibility in both generalization aspects, DMG enjoys a seamless transition from offline to online learning and attains strong online fine-tuning performance.
Related papers
- Generalization Capability for Imitation Learning [1.30536490219656]
Imitation learning holds the promise of equipping robots with versatile skills by learning from expert demonstrations.
However, policies trained on finite datasets often struggle to generalize beyond the training distribution.
We present a unified perspective on the generalization capability of imitation learning, grounded in both information theorey and data distribution property.
arXiv Detail & Related papers (2025-04-25T17:59:59Z) - Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - Zero-Shot Generalization of Vision-Based RL Without Data Augmentation [11.820012065797917]
Generalizing vision-based reinforcement learning (RL) agents to novel environments remains a difficult and open challenge.
We propose a model, Associative Latent DisentAnglement (ALDA), that builds on standard off-policy RL towards zero-shot generalization.
arXiv Detail & Related papers (2024-10-09T21:14:09Z) - Rethinking Multi-domain Generalization with A General Learning Objective [19.28143363034362]
Multi-domain generalization (mDG) is universally aimed to minimize discrepancy between training and testing distributions.
Existing mDG literature lacks a general learning objective paradigm.
We propose to leverage a $Y$-mapping to relax the constraint.
arXiv Detail & Related papers (2024-02-29T05:00:30Z) - A Unified Approach to Controlling Implicit Regularization via Mirror
Descent [18.536453909759544]
Mirror descent (MD) is a notable generalization of gradient descent (GD)
We show that MD can be implemented efficiently and enjoys fast convergence under suitable conditions.
arXiv Detail & Related papers (2023-06-24T03:57:26Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Provable Generalization of Overparameterized Meta-learning Trained with
SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z) - DR3: Value-Based Deep Reinforcement Learning Requires Explicit
Regularization [125.5448293005647]
We discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL.
Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions.
We propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer.
arXiv Detail & Related papers (2021-12-09T06:01:01Z) - Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks.
In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.