Compositional Law Parsing with Latent Random Functions
- URL: http://arxiv.org/abs/2209.09115v1
- Date: Thu, 15 Sep 2022 06:57:23 GMT
- Title: Compositional Law Parsing with Latent Random Functions
- Authors: Fan Shi, Bin Li, Xiangyang Xue
- Abstract summary: We propose a deep latent variable model for Compositional LAw Parsing (CLAP)
CLAP achieves the human-like compositionality ability through an encoding-decoding architecture to represent concepts in the scene as latent variables.
Our experimental results demonstrate that CLAP outperforms the compared baseline methods in multiple visual tasks.
- Score: 54.26307134687171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human cognition has compositionality. We understand a scene by decomposing
the scene into different concepts (e.g. shape and position of an object) and
learning the respective laws of these concepts which may be either natural
(e.g. laws of motion) or man-made (e.g. laws of a game). The automatic parsing
of these laws indicates the model's ability to understand the scene, which
makes law parsing play a central role in many visual tasks. In this paper, we
propose a deep latent variable model for Compositional LAw Parsing (CLAP). CLAP
achieves the human-like compositionality ability through an encoding-decoding
architecture to represent concepts in the scene as latent variables, and
further employ concept-specific random functions, instantiated with Neural
Processes, in the latent space to capture the law on each concept. Our
experimental results demonstrate that CLAP outperforms the compared baseline
methods in multiple visual tasks including intuitive physics, abstract visual
reasoning, and scene representation. In addition, CLAP can learn
concept-specific laws in a scene without supervision and one can edit laws
through modifying the corresponding latent random functions, validating its
interpretability and manipulability.
Related papers
- Identifying Interpretable Subspaces in Image Representations [54.821222487956355]
We propose a framework to explain features of image representations using Contrasting Concepts (FALCON)
For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset and a pre-trained vision-language model like CLIP.
Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts.
arXiv Detail & Related papers (2023-07-20T00:02:24Z) - Can Language Models Understand Physical Concepts? [45.30953251294797]
Language models gradually become general-purpose interfaces in the interactive and embodied world.
It is not yet clear whether LMs can understand physical concepts in the human world.
arXiv Detail & Related papers (2023-05-23T13:36:55Z) - Intrinsic Physical Concepts Discovery with Object-Centric Predictive
Models [86.25460882547581]
We introduce the PHYsical Concepts Inference NEtwork (PHYCINE), a system that infers physical concepts in different abstract levels without supervision.
We show that object representations containing the discovered physical concepts variables could help achieve better performance in causal reasoning tasks.
arXiv Detail & Related papers (2023-03-03T11:52:21Z) - Succinct Representations for Concepts [12.134564449202708]
Foundation models like chatGPT have demonstrated remarkable performance on various tasks.
However, for many questions, they may produce false answers that look accurate.
In this paper, we introduce succinct representations of concepts based on category theory.
arXiv Detail & Related papers (2023-03-01T12:11:23Z) - Prediction of Scene Plausibility [11.641785968519114]
Plausibility can be defined both in terms of physical properties and in terms of functional and typical arrangements.
We build a dataset of synthetic images containing both plausible and implausible scenes.
We test the success of various vision models in the task of recognizing and understanding plausibility.
arXiv Detail & Related papers (2022-12-02T22:22:16Z) - FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams.
The learned concepts support downstream applications, such as answering questions by reasoning about unseen images.
We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z) - Contextualized Scene Imagination for Generative Commonsense Reasoning [35.03682416576795]
generative commonsense reasoning skills are lacking in state-of-the-art text generation methods.
We propose an Imagine-and-Verbalize (I&V) method, which learns to imagine a relational scene knowledge graph.
Experiments demonstrate the effectiveness of I&V in improving language models on both concept-to-sentence and concept-to-story generation tasks.
arXiv Detail & Related papers (2021-12-12T20:38:08Z) - PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
Reasoning [135.2892665079159]
We introduce a new large-scale diagnostic visual reasoning dataset named PTR.
PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations.
We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes.
arXiv Detail & Related papers (2021-12-09T18:59:34Z) - Unsupervised Learning of Compositional Energy Concepts [70.11673173291426]
We propose COMET, which discovers and represents concepts as separate energy functions.
Comet represents both global concepts as well as objects under a unified framework.
arXiv Detail & Related papers (2021-11-04T17:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.