Mind's Eye: Grounded Language Model Reasoning through Simulation
- URL: http://arxiv.org/abs/2210.05359v1
- Date: Tue, 11 Oct 2022 11:39:23 GMT
- Title: Mind's Eye: Grounded Language Model Reasoning through Simulation
- Authors: Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi,
Claire Cui, Denny Zhou, Andrew M. Dai
- Abstract summary: We present Mind's Eye, a paradigm to ground language model reasoning in the physical world.
Experiments show Mind's Eye can improve reasoning ability by a large margin.
Smaller language models armed with Mind's Eye can obtain similar performance to models that are 100x larger.
- Score: 47.654525013443255
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Successful and effective communication between humans and AI relies on a
shared experience of the world. By training solely on written text, current
language models (LMs) miss the grounded experience of humans in the real-world
-- their failure to relate language to the physical world causes knowledge to
be misrepresented and obvious mistakes in their reasoning. We present Mind's
Eye, a paradigm to ground language model reasoning in the physical world. Given
a physical reasoning question, we use a computational physics engine
(DeepMind's MuJoCo) to simulate the possible outcomes, and then use the
simulation results as part of the input, which enables language models to
perform reasoning. Experiments on 39 tasks in a physics alignment benchmark
demonstrate that Mind's Eye can improve reasoning ability by a large margin
(27.9% zero-shot, and 46.0% few-shot absolute accuracy improvement on average).
Smaller language models armed with Mind's Eye can obtain similar performance to
models that are 100x larger. Finally, we confirm the robustness of Mind's Eye
through ablation studies.
Related papers
- Contextual Emotion Recognition using Large Vision Language Models [0.6749750044497732]
Achieving human-level recognition of the apparent emotion of a person in real world situations remains an unsolved task in computer vision.
In this paper, we examine two major approaches enabled by recent large vision language models.
We demonstrate that a vision language model, fine-tuned even on a small dataset, can significantly outperform traditional baselines.
arXiv Detail & Related papers (2024-05-14T23:24:12Z) - Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models [71.93366651585275]
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks.
We propose Visualization-of-Thought (VoT) to elicit spatial reasoning of LLMs by visualizing their reasoning traces.
VoT significantly enhances the spatial reasoning abilities of LLMs.
arXiv Detail & Related papers (2024-04-04T17:45:08Z) - PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large
Multimodal Models [58.33913881592706]
Humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for objects they have never seen before.
This work delves into infusing such physical commonsense reasoning into robotic manipulation.
We introduce PhyGrasp, a multimodal large model that leverages inputs from two modalities: natural language and 3D point clouds.
arXiv Detail & Related papers (2024-02-26T18:57:52Z) - Language Models Don't Learn the Physical Manifestation of Language [0.3529736140137004]
We argue that language-only models don't learn the physical manifestation of language.
We present an empirical investigation of visual-auditory properties of language through a series of tasks, termed H-Test.
arXiv Detail & Related papers (2024-02-17T17:52:24Z) - A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Hum4n L4ngu4ge and the W0rld behind W0rds? [2.7342737448775534]
Large Language Models (LLMs) have been linked to claims about human-like linguistic performance.
We analyze the contribution of LLMs as theoretically informative representations of a target cognitive system.
We evaluate the models' ability to see the bigger picture, through top-down feedback from higher levels of processing.
arXiv Detail & Related papers (2023-07-26T18:58:53Z) - MindGames: Targeting Theory of Mind in Large Language Models with
Dynamic Epistemic Modal Logic [0.6537995248511139]
Theory of Mind (ToM) is a critical component of intelligence but its assessment remains the subject of heated debates.
Here, we leverage dynamic epistemic logic to isolate a particular component of ToM and to generate controlled problems.
Our findings indicate that some language model scaling does not consistently yield results better than random chance.
arXiv Detail & Related papers (2023-05-05T08:14:48Z) - Imagination-Augmented Natural Language Understanding [71.51687221130925]
We introduce an Imagination-Augmented Cross-modal (iACE) to solve natural language understanding tasks.
iACE enables visual imagination with external knowledge transferred from the powerful generative and pre-trained vision-and-language models.
Experiments on GLUE and SWAG show that iACE achieves consistent improvement over visually-supervised pre-trained models.
arXiv Detail & Related papers (2022-04-18T19:39:36Z) - PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D
World [86.21137454228848]
We factorize PIGLeT into a physical dynamics model, and a separate language model.
PIGLeT can read a sentence, simulate neurally what might happen next, and then communicate that result through a literal symbolic representation.
It is able to correctly forecast "what happens next" given an English sentence over 80% of the time, outperforming a 100x larger, text-to-text approach by over 10%.
arXiv Detail & Related papers (2021-06-01T02:32:12Z) - ESPRIT: Explaining Solutions to Physical Reasoning Tasks [106.77019206219984]
ESPRIT is a framework for commonsense reasoning about qualitative physics in natural language.
Our framework learns to generate explanations of how the physical simulation will causally evolve so that an agent or a human can easily reason about a solution.
Human evaluations indicate that ESPRIT produces crucial fine-grained details and has high coverage of physical concepts compared to even human annotations.
arXiv Detail & Related papers (2020-05-02T07:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.