Related papers: CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

URL: http://arxiv.org/abs/2210.10243v1
Date: Wed, 19 Oct 2022 01:45:29 GMT
Title: CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
Authors: Abdus Salam Azad, Izzeddin Gur, Aleksandra Faust, Pieter Abbeel, and Ion Stoica
Abstract summary: CLUTR is a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. We show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments.
Score: 130.79246770546413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the sampled tasks. This is a non-stationary process where the task distribution evolves along with agent policies, creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. By keeping the task manifold fixed, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments: showing an 18x improvement on the F1 CarRacing benchmark. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, outperforming it in nine of the 20 tracks. CLUTR also achieves a 33% higher solved rate than PAIRED on a set of 18 out-of-distribution navigation tasks.

Related papers

Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models. Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process. We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z)
Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task. We name our approach Adaptive Retention & Correction (ARC) ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z)
Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks [3.1594865504808944]
We propose an RL algorithm that automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks. We evaluate our algorithm in a variety of sparse-reward environments.
arXiv Detail & Related papers (2024-01-25T15:06:40Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
On the Benefit of Optimal Transport for Curriculum Reinforcement Learning [32.59609255906321]
We focus on framing curricula ass between task distributions. We frame the generation of a curriculum as a constrained optimal transport problem. Benchmarks show that this way of curriculum generation can improve upon existing CRL methods.
arXiv Detail & Related papers (2023-09-25T12:31:37Z)
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting. Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories. We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z)
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks. Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer. ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z)
Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning. The algorithm selects the order of tasks that provide the fastest learning for agents. We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z)
Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation [46.103426976842336]
Reinforcement Learning (CRL) aims to create a sequence of tasks, starting from easy ones and gradually learning towards difficult tasks. In this work, we focus on the idea of framing CRL as Curriculums between a source (auxiliary) and a target task distribution. Inspired by the insights from gradual domain adaptation in semi-supervised learning, we create a natural curriculum by breaking down the potentially large task distributional shift in CRL into smaller shifts.
arXiv Detail & Related papers (2022-10-18T22:33:33Z)
Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems [42.973910399533054]
We introduce a curriculum learning algorithm, Variational Automatic Curriculum Learning (VACL), for solving cooperative multi-agent reinforcement learning problems. Our VACL algorithm implements this variational paradigm with two practical components, task expansion and entity progression. Experiment results show that VACL solves a collection of sparse-reward problems with a large number of agents.
arXiv Detail & Related papers (2021-11-08T16:35:08Z)
Trying AGAIN instead of Trying Longer: Prior Learning for Automatic Curriculum Learning [39.489869446313065]
A major challenge in the Deep RL (DRL) community is to train agents able to generalize over unseen situations. We propose a two stage ACL approach where 1) a teacher algorithm first learns to train a DRL agent with a high-exploration curriculum, and then 2) distills learned priors from the first run to generate an "expert curriculum" Besides demonstrating 50% improvements on average over the current state of the art, the objective of this work is to give a first example of a new research direction oriented towards refining ACL techniques over multiple learners.
arXiv Detail & Related papers (2020-04-07T07:30:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.