Emergence of Abstract State Representations in Embodied Sequence
Modeling
- URL: http://arxiv.org/abs/2311.02171v2
- Date: Tue, 7 Nov 2023 05:03:08 GMT
- Title: Emergence of Abstract State Representations in Embodied Sequence
Modeling
- Authors: Tian Yun, Zilai Zeng, Kunal Handa, Ashish V. Thapliyal, Bo Pang, Ellie
Pavlick, Chen Sun
- Abstract summary: Sequence modeling aims to mimic the success of language models, where actions are modeled as tokens to predict.
We show that environmental layouts can be reasonably reconstructed from the internal activations of a trained model.
Our results support an optimistic outlook for applications of sequence modeling objectives to more complex embodied decision-making domains.
- Score: 24.827284626429964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision making via sequence modeling aims to mimic the success of language
models, where actions taken by an embodied agent are modeled as tokens to
predict. Despite their promising performance, it remains unclear if embodied
sequence modeling leads to the emergence of internal representations that
represent the environmental state information. A model that lacks abstract
state representations would be liable to make decisions based on surface
statistics which fail to generalize. We take the BabyAI environment, a grid
world in which language-conditioned navigation tasks are performed, and build a
sequence modeling Transformer, which takes a language instruction, a sequence
of actions, and environmental observations as its inputs. In order to
investigate the emergence of abstract state representations, we design a
"blindfolded" navigation task, where only the initial environmental layout, the
language instruction, and the action sequence to complete the task are
available for training. Our probing results show that intermediate
environmental layouts can be reasonably reconstructed from the internal
activations of a trained model, and that language instructions play a role in
the reconstruction accuracy. Our results suggest that many key features of
state representations can emerge via embodied sequence modeling, supporting an
optimistic outlook for applications of sequence modeling objectives to more
complex embodied decision-making domains.
Related papers
- LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - Learning with Language-Guided State Abstractions [58.199148890064826]
Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations.
Our method, LGA, uses a combination of natural language supervision and background knowledge from language models to automatically build state representations tailored to unseen tasks.
Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time.
arXiv Detail & Related papers (2024-02-28T23:57:04Z) - Emergence and Function of Abstract Representations in Self-Supervised
Transformers [0.0]
We study the inner workings of small-scale transformers trained to reconstruct partially masked visual scenes.
We show that the network develops intermediate abstract representations, or abstractions, that encode all semantic features of the dataset.
Using precise manipulation experiments, we demonstrate that abstractions are central to the network's decision-making process.
arXiv Detail & Related papers (2023-12-08T20:47:15Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Online Learning of Reusable Abstract Models for Object Goal Navigation [18.15382773079023]
We present a novel approach to incrementally learn an Abstract Model of an unknown environment.
We show how an agent can reuse the learned model for tackling the Object Goal Navigation task.
arXiv Detail & Related papers (2022-03-04T21:44:43Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Skill Induction and Planning with Latent Language [94.55783888325165]
We formulate a generative model of action sequences in which goals generate sequences of high-level subtask descriptions.
We describe how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks.
In trained models, the space of natural language commands indexes a library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals.
arXiv Detail & Related papers (2021-10-04T15:36:32Z) - S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures.
We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z) - Learning intuitive physics and one-shot imitation using
state-action-prediction self-organizing maps [0.0]
Humans learn by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks.
We suggest a simple but effective unsupervised model which develops such characteristics.
We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.
arXiv Detail & Related papers (2020-07-03T12:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.