PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D
World
- URL: http://arxiv.org/abs/2106.00188v1
- Date: Tue, 1 Jun 2021 02:32:12 GMT
- Title: PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D
World
- Authors: Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi,
Aniruddha Kembhavi, Ali Farhadi, Yejin Choi
- Abstract summary: We factorize PIGLeT into a physical dynamics model, and a separate language model.
PIGLeT can read a sentence, simulate neurally what might happen next, and then communicate that result through a literal symbolic representation.
It is able to correctly forecast "what happens next" given an English sentence over 80% of the time, outperforming a 100x larger, text-to-text approach by over 10%.
- Score: 86.21137454228848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose PIGLeT: a model that learns physical commonsense knowledge through
interaction, and then uses this knowledge to ground language. We factorize
PIGLeT into a physical dynamics model, and a separate language model. Our
dynamics model learns not just what objects are but also what they do: glass
cups break when thrown, plastic ones don't. We then use it as the interface to
our language model, giving us a unified model of linguistic form and grounded
meaning. PIGLeT can read a sentence, simulate neurally what might happen next,
and then communicate that result through a literal symbolic representation, or
natural language.
Experimental results show that our model effectively learns world dynamics,
along with how to communicate them. It is able to correctly forecast "what
happens next" given an English sentence over 80% of the time, outperforming a
100x larger, text-to-text approach by over 10%. Likewise, its natural language
summaries of physical interactions are also judged by humans as more accurate
than LM alternatives. We present comprehensive analysis showing room for future
work.
Related papers
- Visually Grounded Language Learning: a review of language games,
datasets, tasks, and models [60.2604624857992]
Many Vision+Language (V+L) tasks have been defined with the aim of creating models that can ground symbols in the visual modality.
In this work, we provide a systematic literature review of several tasks and models proposed in the V+L field.
arXiv Detail & Related papers (2023-12-05T02:17:29Z) - Learning to Model the World with Language [100.76069091703505]
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world.
Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future.
We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations.
arXiv Detail & Related papers (2023-07-31T17:57:49Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Mind's Eye: Grounded Language Model Reasoning through Simulation [47.654525013443255]
We present Mind's Eye, a paradigm to ground language model reasoning in the physical world.
Experiments show Mind's Eye can improve reasoning ability by a large margin.
Smaller language models armed with Mind's Eye can obtain similar performance to models that are 100x larger.
arXiv Detail & Related papers (2022-10-11T11:39:23Z) - Is neural language acquisition similar to natural? A chronological
probing study [0.0515648410037406]
We present the chronological probing study of transformer English models such as MultiBERT and T5.
We compare the information about the language learned by the models in the process of training on corpora.
The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language.
arXiv Detail & Related papers (2022-07-01T17:24:11Z) - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [119.29555551279155]
Large language models can encode a wealth of semantic knowledge about the world.
Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language.
We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions.
arXiv Detail & Related papers (2022-04-04T17:57:11Z) - Uncovering Constraint-Based Behavior in Neural Models via Targeted
Fine-Tuning [9.391375268580806]
We show that competing linguistic processes within a language obscure underlying linguistic knowledge.
While human behavior has been found to be similar across languages, we find cross-linguistic variation in model behavior.
Our results suggest that models need to learn both the linguistic constraints in a language and their relative ranking, with mismatches in either producing non-human-like behavior.
arXiv Detail & Related papers (2021-06-02T14:52:11Z) - Implicit Representations of Meaning in Neural Language Models [31.71898809435222]
We identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse.
Our results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state.
arXiv Detail & Related papers (2021-06-01T19:23:20Z) - A Visuospatial Dataset for Naturalistic Verb Learning [18.654373173232205]
We introduce a new dataset for training and evaluating grounded language models.
Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access.
We use the collected data to compare several distributional semantics models for verb learning.
arXiv Detail & Related papers (2020-10-28T20:47:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.