Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics
- URL: http://arxiv.org/abs/2402.15654v1
- Date: Sat, 24 Feb 2024 00:01:01 GMT
- Title: Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics
- Authors: Sadaf Ghaffari, Nikhil Krishnaswamy
- Abstract summary: We construct a simple simulated environment and demonstrate examples of where, in a zero-shot setting, both text and multimodal LLMs display atomic world knowledge about various objects but fail to compose this knowledge in correct solutions for an object manipulation and placement task.
We also use BLIP, a vision-language model trained with more sophisticated cross-modal attention, to identify cases relevant to object physical properties that that model fails to ground.
- Score: 5.497036643694402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present an exploration of LLMs' abilities to problem solve
with physical reasoning in situated environments. We construct a simple
simulated environment and demonstrate examples of where, in a zero-shot
setting, both text and multimodal LLMs display atomic world knowledge about
various objects but fail to compose this knowledge in correct solutions for an
object manipulation and placement task. We also use BLIP, a vision-language
model trained with more sophisticated cross-modal attention, to identify cases
relevant to object physical properties that that model fails to ground.
Finally, we present a procedure for discovering the relevant properties of
objects in the environment and propose a method to distill this knowledge back
into the LLM.
Related papers
- LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models [35.01842161084472]
We propose a new physical reasoning task and a dataset, dubbed TraySim.
Our task involves predicting the dynamics of several objects on a tray that is given an external impact.
We present LLMPhy, a zero-shot black-box optimization framework that leverages the physics knowledge and program synthesis abilities of LLMs.
Our results show that the combination of the LLM and the physics engine leads to state-of-the-art zero-shot physical reasoning performance.
arXiv Detail & Related papers (2024-11-12T18:56:58Z) - Language Agents Meet Causality -- Bridging LLMs and Causal World Models [50.79984529172807]
We propose a framework that integrates causal representation learning with large language models.
This framework learns a causal world model, with causal variables linked to natural language expressions.
We evaluate the framework on causal inference and planning tasks across temporal scales and environmental complexities.
arXiv Detail & Related papers (2024-10-25T18:36:37Z) - Open-World Object Detection with Instance Representation Learning [1.8749305679160366]
We propose a method to train an object detector that can both detect novel objects and extract semantically rich features in open-world conditions.
Our method learns a robust and generalizable feature space, outperforming other OWOD-based feature extraction methods.
arXiv Detail & Related papers (2024-09-24T13:13:34Z) - Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange [50.45953583802282]
We introduce a novel self-supervised learning (SSL) strategy for point cloud scene understanding.
Our approach leverages both object patterns and contextual cues to produce robust features.
Our experiments demonstrate the superiority of our method over existing SSL techniques.
arXiv Detail & Related papers (2024-04-11T06:39:53Z) - Cognitive Planning for Object Goal Navigation using Generative AI Models [0.979851640406258]
We present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies.
Our approach enables a robot to navigate unfamiliar environments by leveraging Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
arXiv Detail & Related papers (2024-03-30T10:54:59Z) - Object Detectors in the Open Environment: Challenges, Solutions, and Outlook [95.3317059617271]
The dynamic and intricate nature of the open environment poses novel and formidable challenges to object detectors.
This paper aims to conduct a comprehensive review and analysis of object detectors in open environments.
We propose a framework that includes four quadrants (i.e., out-of-domain, out-of-category, robust learning, and incremental learning) based on the dimensions of the data / target changes.
arXiv Detail & Related papers (2024-03-24T19:32:39Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - From Understanding to Utilization: A Survey on Explainability for Large
Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs)
Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity.
When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z) - LLMs for Robotic Object Disambiguation [21.101902684740796]
Our study reveals the LLM's aptitude for solving complex decision making challenges.
A pivotal focus of our research is the object disambiguation capability of LLMs.
We have developed a few-shot prompt engineering system to improve the LLM's ability to pose disambiguating queries.
arXiv Detail & Related papers (2024-01-07T04:46:23Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.