ESPRIT: Explaining Solutions to Physical Reasoning Tasks
- URL: http://arxiv.org/abs/2005.00730v2
- Date: Thu, 14 May 2020 00:24:13 GMT
- Title: ESPRIT: Explaining Solutions to Physical Reasoning Tasks
- Authors: Nazneen Fatema Rajani, Rui Zhang, Yi Chern Tan, Stephan Zheng, Jeremy
Weiss, Aadit Vyas, Abhijit Gupta, Caiming XIong, Richard Socher, Dragomir
Radev
- Abstract summary: ESPRIT is a framework for commonsense reasoning about qualitative physics in natural language.
Our framework learns to generate explanations of how the physical simulation will causally evolve so that an agent or a human can easily reason about a solution.
Human evaluations indicate that ESPRIT produces crucial fine-grained details and has high coverage of physical concepts compared to even human annotations.
- Score: 106.77019206219984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks lack the ability to reason about qualitative physics and so
cannot generalize to scenarios and tasks unseen during training. We propose
ESPRIT, a framework for commonsense reasoning about qualitative physics in
natural language that generates interpretable descriptions of physical events.
We use a two-step approach of first identifying the pivotal physical events in
an environment and then generating natural language descriptions of those
events using a data-to-text approach. Our framework learns to generate
explanations of how the physical simulation will causally evolve so that an
agent or a human can easily reason about a solution using those interpretable
descriptions. Human evaluations indicate that ESPRIT produces crucial
fine-grained details and has high coverage of physical concepts compared to
even human annotations. Dataset, code and documentation are available at
https://github.com/salesforce/esprit.
Related papers
- A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek Tweets [8.846643533783205]
This work introduces an early concept for a novel pipeline that can be used in text classification tasks.
It comprises of two models: a classifier for labelling the text and an explanation generator which provides the explanation.
Experiments are centred around the tasks of sentiment analysis and offensive language identification in Greek tweets.
arXiv Detail & Related papers (2024-10-14T08:41:31Z) - Compositional Physical Reasoning of Objects and Events from Videos [122.6862357340911]
This paper addresses the challenge of inferring hidden physical properties from objects' motion and interactions.
We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties.
We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties.
arXiv Detail & Related papers (2024-08-02T15:19:55Z) - Physical Property Understanding from Language-Embedded Feature Fields [27.151380830258603]
We present a novel approach for dense prediction of the physical properties of objects using a collection of images.
Inspired by how humans reason about physics through vision, we leverage large language models to propose candidate materials for each object.
Our method is accurate, annotation-free, and applicable to any object in the open world.
arXiv Detail & Related papers (2024-04-05T17:45:07Z) - PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large
Multimodal Models [58.33913881592706]
Humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for objects they have never seen before.
This work delves into infusing such physical commonsense reasoning into robotic manipulation.
We introduce PhyGrasp, a multimodal large model that leverages inputs from two modalities: natural language and 3D point clouds.
arXiv Detail & Related papers (2024-02-26T18:57:52Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - CLEVRER-Humans: Describing Physical and Causal Events the Human Way [55.44915246065028]
We present the CLEVRER-Humans benchmark, a video dataset for causal judgment of physical events with human labels.
We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models.
arXiv Detail & Related papers (2023-10-05T16:09:48Z) - Imagination-Augmented Natural Language Understanding [71.51687221130925]
We introduce an Imagination-Augmented Cross-modal (iACE) to solve natural language understanding tasks.
iACE enables visual imagination with external knowledge transferred from the powerful generative and pre-trained vision-and-language models.
Experiments on GLUE and SWAG show that iACE achieves consistent improvement over visually-supervised pre-trained models.
arXiv Detail & Related papers (2022-04-18T19:39:36Z) - Physion: Evaluating Physical Prediction from Vision in Humans and
Machines [46.19008633309041]
We present a visual and physical prediction benchmark that precisely measures this capability.
We compare an array of algorithms on their ability to make diverse physical predictions.
We find that graph neural networks with access to the physical state best capture human behavior.
arXiv Detail & Related papers (2021-06-15T16:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.