Related papers: Improving Long-Horizon Imitation Through Instruction Prediction

Improving Long-Horizon Imitation Through Instruction Prediction

URL: http://arxiv.org/abs/2306.12554v1
Date: Wed, 21 Jun 2023 20:47:23 GMT
Title: Improving Long-Horizon Imitation Through Instruction Prediction
Authors: Joey Hejna, Pieter Abbeel, Lerrel Pinto
Abstract summary: In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans.
Score: 93.47416552953075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. Concretely, we demonstrate that instruction modeling significantly improves performance in planning environments when training with a limited number of demonstrations on the BabyAI and Crafter benchmarks. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans. More details and code can be found at https://github.com/jhejna/instruction-prediction.

Related papers

Simplifying DINO via Coding Rate Regularization [74.88963795406733]
DINO and DINOv2 are two model families being widely used to learn representations from unlabeled imagery data at large scales. This work highlights the potential of using simplifying design principles to improve the empirical practice of deep learning.
arXiv Detail & Related papers (2025-02-14T18:58:04Z)
The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples. We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts. Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z)
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance [68.56701216210617]
In-principle, one would expect models to adapt to the user context better after instruction finetuning. We observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases.
arXiv Detail & Related papers (2024-10-14T17:57:09Z)
Planning Transformer: Long-Horizon Offline Reinforcement Learning with Planning Tokens [1.8416014644193066]
We introduce Planning Tokens, which contain high-level, long time-scale information about the agent's future. We demonstrate that Planning Tokens improve the interpretability of the model's policy through the interpretable plan visualisations and attention map.
arXiv Detail & Related papers (2024-09-14T19:30:53Z)
TrACT: A Training Dynamics Aware Contrastive Learning Framework for Long-tail Trajectory Prediction [7.3292387742640415]
We propose to incorporate richer training dynamics information into a prototypical contrastive learning framework. We conduct empirical evaluations of our approach using two large-scale naturalistic datasets.
arXiv Detail & Related papers (2024-04-18T23:12:46Z)
Transformer-based Causal Language Models Perform Clustering [20.430255724239448]
We introduce a simplified instruction-following task and use synthetic datasets to analyze a Transformer-based causal language model. Our findings suggest that the model learns task-specific information by clustering data within its hidden space, with this clustering process evolving dynamically during learning.
arXiv Detail & Related papers (2024-02-19T14:02:31Z)
Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme. We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language. We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z)
Representation Learning for Weakly Supervised Relation Extraction [19.689433249830465]
In this thesis, we present several novel unsupervised pre-training models to learn the distributed text representation features. The experiments have demonstrated that this type of feature, combine with the traditional hand-crafted features, could improve the performance of the logistic classification model for relation extraction.
arXiv Detail & Related papers (2021-04-10T12:22:25Z)
Adversarial Imitation Learning with Trajectorial Augmentation and Correction [61.924411952657756]
We introduce a novel augmentation method which preserves the success of the augmented trajectories. We develop an adversarial data augmented imitation architecture to train an imitation agent using synthetic experts. Experiments show that our data augmentation strategy can improve accuracy and convergence time of adversarial imitation.
arXiv Detail & Related papers (2021-03-25T14:49:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.