Unsupervised Learning of Compositional Energy Concepts
- URL: http://arxiv.org/abs/2111.03042v1
- Date: Thu, 4 Nov 2021 17:46:12 GMT
- Title: Unsupervised Learning of Compositional Energy Concepts
- Authors: Yilun Du, Shuang Li, Yash Sharma, Joshua B. Tenenbaum, Igor Mordatch
- Abstract summary: We propose COMET, which discovers and represents concepts as separate energy functions.
Comet represents both global concepts as well as objects under a unified framework.
- Score: 70.11673173291426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans are able to rapidly understand scenes by utilizing concepts extracted
from prior experience. Such concepts are diverse, and include global scene
descriptors, such as the weather or lighting, as well as local scene
descriptors, such as the color or size of a particular object. So far,
unsupervised discovery of concepts has focused on either modeling the global
scene-level or the local object-level factors of variation, but not both. In
this work, we propose COMET, which discovers and represents concepts as
separate energy functions, enabling us to represent both global concepts as
well as objects under a unified framework. COMET discovers energy functions
through recomposing the input image, which we find captures independent factors
without additional supervision. Sample generation in COMET is formulated as an
optimization process on underlying energy functions, enabling us to generate
images with permuted and composed concepts. Finally, discovered visual concepts
in COMET generalize well, enabling us to compose concepts between separate
modalities of images as well as with other concepts discovered by a separate
instance of COMET trained on a different dataset. Code and data available at
https://energy-based-model.github.io/comet/.
Related papers
- ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction [20.43411883845885]
We introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts.
Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models.
We present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects.
arXiv Detail & Related papers (2024-07-09T17:50:28Z) - Towards Compositionality in Concept Learning [20.960438848942445]
We show that existing unsupervised concept extraction methods find concepts which are not compositional.
We propose Compositional Concept Extraction (CCE) for finding concepts which obey these properties.
CCE finds more compositional concept representations than baselines and yields better accuracy on four downstream classification tasks.
arXiv Detail & Related papers (2024-06-26T17:59:30Z) - Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models [21.245185285617698]
Visual Concept Connectome (VCC) discovers human interpretable concepts and their interlayer connections in a fully unsupervised manner.
Our approach simultaneously reveals fine-grained concepts at a layer, connection weightings across all layers and is amendable to global analysis of network structure.
arXiv Detail & Related papers (2024-04-02T18:40:55Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - CLiC: Concept Learning in Context [54.81654147248919]
This paper builds upon recent advancements in visual concept learning.
It involves acquiring a visual concept from a source image and subsequently applying it to an object in a target image.
To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area.
arXiv Detail & Related papers (2023-11-28T01:33:18Z) - AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations.
Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - SegDiscover: Visual Concept Discovery via Unsupervised Semantic
Segmentation [29.809900593362844]
SegDiscover is a novel framework that discovers semantically meaningful visual concepts from imagery datasets with complex scenes without supervision.
Our method generates concept primitives from raw images, discovering concepts by clustering in the latent space of a self-supervised pretrained encoder, and concept refinement via neural network smoothing.
arXiv Detail & Related papers (2022-04-22T20:44:42Z) - Interactive Disentanglement: Learning Concepts by Interacting with their
Prototype Representations [15.284688801788912]
We show the advantages of prototype representations for understanding and revising the latent space of neural concept learners.
For this purpose, we introduce interactive Concept Swapping Networks (iCSNs)
iCSNs learn to bind conceptual information to specific prototype slots by swapping the latent representations of paired images.
arXiv Detail & Related papers (2021-12-04T09:25:40Z) - Visual Concept Reasoning Networks [93.99840807973546]
A split-transform-merge strategy has been broadly used as an architectural constraint in convolutional neural networks for visual recognition tasks.
We propose to exploit this strategy and combine it with our Visual Concept Reasoning Networks (VCRNet) to enable reasoning between high-level visual concepts.
Our proposed model, VCRNet, consistently improves the performance by increasing the number of parameters by less than 1%.
arXiv Detail & Related papers (2020-08-26T20:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.