How language models extrapolate outside the training data: A case study in Textualized Gridworld
- URL: http://arxiv.org/abs/2406.15275v2
- Date: Tue, 08 Oct 2024 05:06:55 GMT
- Title: How language models extrapolate outside the training data: A case study in Textualized Gridworld
- Authors: Doyoung Kim, Jongwon Lee, Jinho Park, Minjoon Seo,
- Abstract summary: We show that conventional approaches, including next-token prediction and Chain of Thought fine-tuning, fail to generalize in larger, unseen environments.
Inspired by human cognition and dual-process theory, we propose language models should construct cognitive maps before interaction.
- Score: 32.5268320198854
- License:
- Abstract: Language models' ability to extrapolate learned behaviors to novel, more complex environments beyond their training scope is highly unknown. This study introduces a path planning task in a textualized Gridworld to probe language models' extrapolation capabilities. We show that conventional approaches, including next-token prediction and Chain of Thought (CoT) fine-tuning, fail to generalize in larger, unseen environments. Inspired by human cognition and dual-process theory, we propose language models should construct cognitive maps before interaction. Our research demonstrates that autoregressive generation of cognitive maps and planning sequences enhances planning capabilities in extrapolated environments. Unlike CoT, we find that cognitive maps cannot be obtained through simple prompting, necessitating additional training schemes for integration. Our findings in Gridworld offer insights into training language models with improved reasoning and adaptability, potentially advancing more human-like cognition and opening avenues for enhancing model generalization across diverse, complex tasks.
Related papers
- ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation Using Large Language Models and Transformers [1.6541870997607049]
We present ARPA, an architecture that fuses the unparalleled contextual understanding of large language models with the advanced feature extraction capabilities of transformers.
ARPA's introduction marks a significant milestone in visual word disambiguation, offering a compelling solution.
We invite researchers and practitioners to explore the capabilities of our model, envisioning a future where such hybrid models drive unprecedented advancements in artificial intelligence.
arXiv Detail & Related papers (2024-08-12T10:15:13Z) - Compositional Generalization with Grounded Language Models [9.96679221246835]
Grounded language models use external sources of information, such as knowledge graphs, to meet some of the general challenges associated with pre-training.
We develop a procedure for generating natural language questions paired with knowledge graphs that targets different aspects of compositionality.
arXiv Detail & Related papers (2024-06-07T14:56:51Z) - Navigation with Large Language Models: Semantic Guesswork as a Heuristic
for Planning [73.0990339667978]
Navigation in unfamiliar environments presents a major challenge for robots.
We use language models to bias exploration of novel real-world environments.
We evaluate LFG in challenging real-world environments and simulated benchmarks.
arXiv Detail & Related papers (2023-10-16T06:21:06Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - From Word Models to World Models: Translating from Natural Language to
the Probabilistic Language of Thought [124.40905824051079]
We propose rational meaning construction, a computational framework for language-informed thinking.
We frame linguistic meaning as a context-sensitive mapping from natural language into a probabilistic language of thought.
We show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings.
We extend our framework to integrate cognitively-motivated symbolic modules.
arXiv Detail & Related papers (2023-06-22T05:14:00Z) - See, Plan, Predict: Language-guided Cognitive Planning with Video
Prediction [27.44435424335596]
We devise a cognitive planning algorithm via language-guided video prediction.
The network is endowed with the ability to ground concepts based on natural language input with generalization to unseen objects.
arXiv Detail & Related papers (2022-10-07T21:27:16Z) - Improving Policy Learning via Language Dynamics Distillation [87.27583619910338]
We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions.
We show that language descriptions in demonstrations improve sample-efficiency and generalization across environments.
arXiv Detail & Related papers (2022-09-30T19:56:04Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Zero-Shot Compositional Policy Learning via Language Grounding [13.45138913186308]
Humans can adapt to new tasks quickly by leveraging prior knowledge about the world such as language descriptions.
We introduce a new research platform BabyAI++ in which the dynamics of environments are disentangled from visual appearance.
We find that current language-guided RL/IL techniques overfit to the training environments and suffer from a huge performance drop when facing unseen combinations.
arXiv Detail & Related papers (2020-04-15T16:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.