Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
- URL: http://arxiv.org/abs/2502.05454v2
- Date: Thu, 13 Feb 2025 08:54:06 GMT
- Title: Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
- Authors: Vivek Myers, Bill Chunyuan Zheng, Anca Dragan, Kuan Fang, Sergey Levine,
- Abstract summary: We show that learning to associate the representations of current and future states with a temporal loss can improve compositional generalization.
We evaluate our approach across diverse robotic manipulation tasks as well as in simulation, showing substantial improvements for tasks specified with either language or goal images.
- Score: 50.377287115281476
- License:
- Abstract: Effective task representations should facilitate compositionality, such that after learning a variety of basic tasks, an agent can perform compound tasks consisting of multiple steps simply by composing the representations of the constituent steps together. While this is conceptually simple and appealing, it is not clear how to automatically learn representations that enable this sort of compositionality. We show that learning to associate the representations of current and future states with a temporal alignment loss can improve compositional generalization, even in the absence of any explicit subtask planning or reinforcement learning. We evaluate our approach across diverse robotic manipulation tasks as well as in simulation, showing substantial improvements for tasks specified with either language or goal images.
Related papers
- IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning [94.52149969720712]
IntCoOp learns to jointly align attribute-level inductive biases and class embeddings during prompt-tuning.
IntCoOp improves CoOp by 7.35% in average performance across 10 diverse datasets.
arXiv Detail & Related papers (2024-06-19T16:37:31Z) - Learning Symbolic Task Representation from a Human-Led Demonstration: A Memory to Store, Retrieve, Consolidate, and Forget Experiences [3.0501524254444767]
We present a symbolic learning framework inspired by cognitive-like memory functionalities.
Our main contribution is the formalisation of a framework that can be used to investigate different memorises for bootstrapping hierarchical knowledge representations.
arXiv Detail & Related papers (2024-04-16T14:14:34Z) - State Representations as Incentives for Reinforcement Learning Agents: A Sim2Real Analysis on Robotic Grasping [3.4777703321218225]
This work examines the effect of various representations in incentivizing the agent to solve a specific robotic task.
A continuum of state representations is defined, starting from hand-crafted numerical states to encoded image-based representations.
The effects of each representation on the ability of the agent to solve the task in simulation and the transferability of the learned policy to the real robot are examined.
arXiv Detail & Related papers (2023-09-21T11:41:22Z) - Diversifying Joint Vision-Language Tokenization Learning [51.82353485389527]
Building joint representations across images and text is an essential step for tasks such as Visual Question Answering and Video Question Answering.
We propose joint vision-language representation learning by diversifying the tokenization learning process.
arXiv Detail & Related papers (2023-06-06T05:41:42Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Learning Abstract and Transferable Representations for Planning [25.63560394067908]
We propose a framework for autonomously learning state abstractions of an agent's environment.
These abstractions are task-independent, and so can be reused to solve new tasks.
We show how to combine these portable representations with problem-specific ones to generate a sound description of a specific task.
arXiv Detail & Related papers (2022-05-04T14:40:04Z) - Learning to Follow Language Instructions with Compositional Policies [22.778677208048475]
We propose a framework that learns to execute natural language instructions in an environment consisting of goal-reaching tasks.
We train a reinforcement learning agent to learn value functions that can be subsequently composed through a Boolean algebra.
We fine-tune a seq2seq model pretrained on web-scale corpora to map language to logical expressions.
arXiv Detail & Related papers (2021-10-09T21:28:26Z) - Multi-Task Reinforcement Learning with Context-based Representations [43.93866702838777]
We propose an efficient approach to knowledge transfer through the use of multiple context-dependent, composable representations across a family of tasks.
We use the proposed approach to obtain state-of-the-art results in Meta-World, a challenging multi-task benchmark consisting of 50 distinct robotic manipulation tasks.
arXiv Detail & Related papers (2021-02-11T18:41:27Z) - Inferring Temporal Compositions of Actions Using Probabilistic Automata [61.09176771931052]
We propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata.
Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences.
arXiv Detail & Related papers (2020-04-28T00:15:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.