ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models
- URL: http://arxiv.org/abs/2306.04695v2
- Date: Thu, 22 Feb 2024 19:11:46 GMT
- Title: ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models
- Authors: Maitreya Patel and Tejas Gokhale and Chitta Baral and Yezhou Yang
- Abstract summary: We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
- Score: 79.10890337599166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to understand visual concepts and replicate and compose these
concepts from images is a central goal for computer vision. Recent advances in
text-to-image (T2I) models have lead to high definition and realistic image
quality generation by learning from large databases of images and their
descriptions. However, the evaluation of T2I models has focused on photorealism
and limited qualitative measures of visual understanding. To quantify the
ability of T2I models in learning and synthesizing novel visual concepts
(a.k.a. personalized T2I), we introduce ConceptBed, a large-scale dataset that
consists of 284 unique visual concepts, and 33K composite text prompts. Along
with the dataset, we propose an evaluation metric, Concept Confidence Deviation
(CCD), that uses the confidence of oracle concept classifiers to measure the
alignment between concepts generated by T2I generators and concepts contained
in target images. We evaluate visual concepts that are either objects,
attributes, or styles, and also evaluate four dimensions of compositionality:
counting, attributes, relations, and actions. Our human study shows that CCD is
highly correlated with human understanding of concepts. Our results point to a
trade-off between learning the concepts and preserving the compositionality
which existing approaches struggle to overcome. The data, code, and interactive
demo is available at: https://conceptbed.github.io/
Related papers
- Towards Compositionality in Concept Learning [20.960438848942445]
We show that existing unsupervised concept extraction methods find concepts which are not compositional.
We propose Compositional Concept Extraction (CCE) for finding concepts which obey these properties.
CCE finds more compositional concept representations than baselines and yields better accuracy on four downstream classification tasks.
arXiv Detail & Related papers (2024-06-26T17:59:30Z) - Pre-trained Vision-Language Models Learn Discoverable Visual Concepts [33.302556000017844]
We aim to answer this question as visual concepts learned "for free" would enable wide applications.
We assume that the visual concepts, if captured by pre-trained VLMs, can be extracted by their vision-language interface with text-based concept prompts.
Our proposed concept discovery and learning framework is thus designed to identify a diverse list of generic visual concepts.
arXiv Detail & Related papers (2024-04-19T06:41:32Z) - Knowledge graphs for empirical concept retrieval [1.06378109904813]
Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user.
Here, we present a workflow for user-driven data collection in both text and image domains.
We test the retrieved concept datasets on two concept-based explainability methods, namely concept activation vectors (CAVs) and concept activation regions (CARs)
arXiv Detail & Related papers (2024-04-10T13:47:22Z) - M2ConceptBase: A Fine-grained Aligned Multi-modal Conceptual Knowledge
Base [65.20833158693705]
We propose a multi-modal conceptual knowledge base, named M2ConceptBase, to provide fine-grained alignment between images and concepts.
Specifically, M2ConceptBase models concepts as nodes, associating each with relevant images and detailed text.
A cutting-edge large language model supplements descriptions for concepts not grounded via our symbol grounding approach.
arXiv Detail & Related papers (2023-12-16T11:06:11Z) - CLiC: Concept Learning in Context [54.81654147248919]
This paper builds upon recent advancements in visual concept learning.
It involves acquiring a visual concept from a source image and subsequently applying it to an object in a target image.
To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area.
arXiv Detail & Related papers (2023-11-28T01:33:18Z) - Text-to-Image Generation for Abstract Concepts [76.32278151607763]
We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC)
The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity.
The concept-dependent form is retrieved from an LLM-extracted form pattern set.
arXiv Detail & Related papers (2023-09-26T02:22:39Z) - Create Your World: Lifelong Text-to-Image Diffusion [75.14353789007902]
We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts.
In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module.
Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-09-08T16:45:56Z) - Concept Bottleneck with Visual Concept Filtering for Explainable Medical
Image Classification [16.849592713393896]
Concept Bottleneck Models (CBMs) enable interpretable image classification by utilizing human-understandable concepts as intermediate targets.
We propose a visual activation score that measures whether the concept contains visual cues or not.
Computed visual activation scores are then used to filter out the less visible concepts, thus resulting in a final concept set with visually meaningful concepts.
arXiv Detail & Related papers (2023-08-23T05:04:01Z) - FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams.
The learned concepts support downstream applications, such as answering questions by reasoning about unseen images.
We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z) - Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal
Frames [1.4502611532302037]
Social concepts referring to non-physical objects are powerful tools to describe, index, and query the content of visual data.
We propose a software approach to represent social concepts as multimodal frames, by integrating multisensory data.
Our method focuses on the extraction, analysis, and integration of multimodal features from visual art material tagged with the concepts of interest.
arXiv Detail & Related papers (2021-10-14T14:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.