Closing the Gap between TD Learning and Supervised Learning -- A
Generalisation Point of View
- URL: http://arxiv.org/abs/2401.11237v2
- Date: Tue, 12 Mar 2024 01:58:18 GMT
- Title: Closing the Gap between TD Learning and Supervised Learning -- A
Generalisation Point of View
- Authors: Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach
- Abstract summary: Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training.
This oft-sought property is one of the few ways in which RL methods based on dynamic-programming differ from RL methods based on supervised-learning (SL)
It remains unclear whether those methods forgo this important stitching property.
- Score: 51.30152184507165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Some reinforcement learning (RL) algorithms can stitch pieces of experience
to solve a task never seen before during training. This oft-sought property is
one of the few ways in which RL methods based on dynamic-programming differ
from RL methods based on supervised-learning (SL). Yet, certain RL methods
based on off-the-shelf SL algorithms achieve excellent results without an
explicit mechanism for stitching; it remains unclear whether those methods
forgo this important stitching property. This paper studies this question for
the problems of achieving a target goal state and achieving a target return
value. Our main result is to show that the stitching property corresponds to a
form of combinatorial generalization: after training on a distribution of
(state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen
together in the training data. Our analysis shows that this sort of
generalization is different from i.i.d. generalization. This connection between
stitching and generalisation reveals why we should not expect SL-based RL
methods to perform stitching, even in the limit of large datasets and models.
Based on this analysis, we construct new datasets to explicitly test for this
property, revealing that SL-based methods lack this stitching property and
hence fail to perform combinatorial generalization. Nonetheless, the connection
between stitching and combinatorial generalisation also suggests a simple
remedy for improving generalisation in SL: data augmentation. We propose a
temporal data augmentation and demonstrate that adding it to SL-based methods
enables them to successfully complete tasks not seen together during training.
On a high level, this connection illustrates the importance of combinatorial
generalization for data efficiency in time-series data beyond tasks beyond RL,
like audio, video, or text.
Related papers
- Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Improving Zero-shot Generalization in Offline Reinforcement Learning
using Generalized Similarity Functions [34.843526573355746]
Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but exhibit difficulty in generalizing to scenarios not seen during training.
We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.
We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior.
arXiv Detail & Related papers (2021-11-29T15:42:54Z) - Reinforcement Learning with Augmented Data [97.42819506719191]
We present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms.
We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods.
arXiv Detail & Related papers (2020-04-30T17:35:32Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z) - Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task.
We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.