Multi-Environment Pretraining Enables Transfer to Action Limited
Datasets
- URL: http://arxiv.org/abs/2211.13337v1
- Date: Wed, 23 Nov 2022 22:48:22 GMT
- Title: Multi-Environment Pretraining Enables Transfer to Action Limited
Datasets
- Authors: David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch,
Ofir Nachum
- Abstract summary: In reinforcement learning, available data of decision making is often not annotated with actions.
We propose combining large but sparsely-annotated datasets from a emphtarget environment of interest with fully-annotated datasets from various other emphsource environments.
We show that utilizing even one additional environment dataset of sequential labelled data during IDM pretraining gives rise to substantial improvements in generating action labels for unannotated sequences.
- Score: 129.24823721649028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using massive datasets to train large-scale models has emerged as a dominant
approach for broad generalization in natural language and vision applications.
In reinforcement learning, however, a key challenge is that available data of
sequential decision making is often not annotated with actions - for example,
videos of game-play are much more available than sequences of frames paired
with their logged game controls. We propose to circumvent this challenge by
combining large but sparsely-annotated datasets from a \emph{target}
environment of interest with fully-annotated datasets from various other
\emph{source} environments. Our method, Action Limited PreTraining (ALPT),
leverages the generalization capabilities of inverse dynamics modelling (IDM)
to label missing action data in the target environment. We show that utilizing
even one additional environment dataset of labelled data during IDM pretraining
gives rise to substantial improvements in generating action labels for
unannotated sequences. We evaluate our method on benchmark game-playing
environments and show that we can significantly improve game performance and
generalization capability compared to other approaches, using annotated
datasets equivalent to only $12$ minutes of gameplay. Highlighting the power of
IDM, we show that these benefits remain even when target and source
environments share no common actions.
Related papers
- Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - GUESR: A Global Unsupervised Data-Enhancement with Bucket-Cluster
Sampling for Sequential Recommendation [58.6450834556133]
We propose graph contrastive learning to enhance item representations with complex associations from the global view.
We extend the CapsNet module with the elaborately introduced target-attention mechanism to derive users' dynamic preferences.
Our proposed GUESR could not only achieve significant improvements but also could be regarded as a general enhancement strategy.
arXiv Detail & Related papers (2023-03-01T05:46:36Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Basket-based Softmax [12.744577044692276]
We propose a novel mining-during-training strategy called Basket-based Softmax (BBS)
For each training sample, we simultaneously adopt similarity scores as the clue to mining negative classes from other datasets.
We demonstrate the efficiency and superiority of the BBS on the tasks of face recognition and re-identification, with both simulated and real-world datasets.
arXiv Detail & Related papers (2022-01-23T16:43:29Z) - JRDB-Act: A Large-scale Multi-modal Dataset for Spatio-temporal Action,
Social Group and Activity Detection [54.696819174421584]
We introduce JRDB-Act, a multi-modal dataset that reflects a real distribution of human daily life actions in a university campus environment.
JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels.
JRDB-Act comes with social group identification annotations conducive to the task of grouping individuals based on their interactions in the scene.
arXiv Detail & Related papers (2021-06-16T14:43:46Z) - Vision-Language Navigation with Random Environmental Mixup [112.94609558723518]
Vision-language Navigation (VLN) tasks require an agent to navigate step-by-step while perceiving the visual observations and comprehending a natural language instruction.
Previous works have proposed various data augmentation methods to reduce data bias.
We propose the Random Environmental Mixup (REM) method, which generates cross-connected house scenes as augmented data via mixuping environment.
arXiv Detail & Related papers (2021-06-15T04:34:26Z) - Pretraining Representations for Data-Efficient Reinforcement Learning [12.43475487724972]
We use unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data.
When limited to 100k steps of interaction on Atari games, our approach significantly surpasses prior work.
Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data.
arXiv Detail & Related papers (2021-06-09T04:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.