Compositional Law Parsing with Latent Random Functions
- URL: http://arxiv.org/abs/2209.09115v1
- Date: Thu, 15 Sep 2022 06:57:23 GMT
- Title: Compositional Law Parsing with Latent Random Functions
- Authors: Fan Shi, Bin Li, Xiangyang Xue
- Abstract summary: We propose a deep latent variable model for Compositional LAw Parsing (CLAP)
CLAP achieves the human-like compositionality ability through an encoding-decoding architecture to represent concepts in the scene as latent variables.
Our experimental results demonstrate that CLAP outperforms the compared baseline methods in multiple visual tasks.
- Score: 54.26307134687171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human cognition has compositionality. We understand a scene by decomposing
the scene into different concepts (e.g. shape and position of an object) and
learning the respective laws of these concepts which may be either natural
(e.g. laws of motion) or man-made (e.g. laws of a game). The automatic parsing
of these laws indicates the model's ability to understand the scene, which
makes law parsing play a central role in many visual tasks. In this paper, we
propose a deep latent variable model for Compositional LAw Parsing (CLAP). CLAP
achieves the human-like compositionality ability through an encoding-decoding
architecture to represent concepts in the scene as latent variables, and
further employ concept-specific random functions, instantiated with Neural
Processes, in the latent space to capture the law on each concept. Our
experimental results demonstrate that CLAP outperforms the compared baseline
methods in multiple visual tasks including intuitive physics, abstract visual
reasoning, and scene representation. In addition, CLAP can learn
concept-specific laws in a scene without supervision and one can edit laws
through modifying the corresponding latent random functions, validating its
interpretability and manipulability.
Related papers
- A Complexity-Based Theory of Compositionality [53.025566128892066]
In AI, compositional representations can enable a powerful form of out-of-distribution generalization.
Here, we propose a formal definition of compositionality that accounts for and extends our intuitions about compositionality.
The definition is conceptually simple, quantitative, grounded in algorithmic information theory, and applicable to any representation.
arXiv Detail & Related papers (2024-10-18T18:37:27Z) - Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models [60.80960965051388]
Adjectives and verbs are entangled with nouns (subject)
Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step.
Lego-generated concepts were preferred over 70% of the time when compared to the baseline.
arXiv Detail & Related papers (2023-11-23T07:33:38Z) - Identifying Interpretable Subspaces in Image Representations [54.821222487956355]
We propose a framework to explain features of image representations using Contrasting Concepts (FALCON)
For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset and a pre-trained vision-language model like CLIP.
Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts.
arXiv Detail & Related papers (2023-07-20T00:02:24Z) - Intrinsic Physical Concepts Discovery with Object-Centric Predictive
Models [86.25460882547581]
We introduce the PHYsical Concepts Inference NEtwork (PHYCINE), a system that infers physical concepts in different abstract levels without supervision.
We show that object representations containing the discovered physical concepts variables could help achieve better performance in causal reasoning tasks.
arXiv Detail & Related papers (2023-03-03T11:52:21Z) - Succinct Representations for Concepts [12.134564449202708]
Foundation models like chatGPT have demonstrated remarkable performance on various tasks.
However, for many questions, they may produce false answers that look accurate.
In this paper, we introduce succinct representations of concepts based on category theory.
arXiv Detail & Related papers (2023-03-01T12:11:23Z) - Prediction of Scene Plausibility [11.641785968519114]
Plausibility can be defined both in terms of physical properties and in terms of functional and typical arrangements.
We build a dataset of synthetic images containing both plausible and implausible scenes.
We test the success of various vision models in the task of recognizing and understanding plausibility.
arXiv Detail & Related papers (2022-12-02T22:22:16Z) - Contextualized Scene Imagination for Generative Commonsense Reasoning [35.03682416576795]
generative commonsense reasoning skills are lacking in state-of-the-art text generation methods.
We propose an Imagine-and-Verbalize (I&V) method, which learns to imagine a relational scene knowledge graph.
Experiments demonstrate the effectiveness of I&V in improving language models on both concept-to-sentence and concept-to-story generation tasks.
arXiv Detail & Related papers (2021-12-12T20:38:08Z) - PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
Reasoning [135.2892665079159]
We introduce a new large-scale diagnostic visual reasoning dataset named PTR.
PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations.
We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes.
arXiv Detail & Related papers (2021-12-09T18:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.