FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations
- URL: http://arxiv.org/abs/2203.16639v1
- Date: Wed, 30 Mar 2022 19:45:00 GMT
- Title: FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations
- Authors: Lingjie Mei, Jiayuan Mao, Ziqi Wang, Chuang Gan, Joshua B. Tenenbaum
- Abstract summary: We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams.
The learned concepts support downstream applications, such as answering questions by reasoning about unseen images.
We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
- Score: 99.54048050189971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a meta-learning framework for learning new visual concepts
quickly, from just one or a few examples, guided by multiple naturally
occurring data streams: simultaneously looking at images, reading sentences
that describe the objects in the scene, and interpreting supplemental sentences
that relate the novel concept with other concepts. The learned concepts support
downstream applications, such as answering questions by reasoning about unseen
images. Our model, namely FALCON, represents individual visual concepts, such
as colors and shapes, as axis-aligned boxes in a high-dimensional space (the
"box embedding space"). Given an input image and its paired sentence, our model
first resolves the referential expression in the sentence and associates the
novel concept with particular objects in the scene. Next, our model interprets
supplemental sentences to relate the novel concept with other known concepts,
such as "X has property Y" or "X is a kind of Y". Finally, it infers an optimal
box embedding for the novel concept that jointly 1) maximizes the likelihood of
the observed instances in the image, and 2) satisfies the relationships between
the novel concepts and the known ones. We demonstrate the effectiveness of our
model on both synthetic and real-world datasets.
Related papers
- Knowledge Transfer Across Modalities with Natural Language Supervision [8.493435472659646]
We present a way to learn novel concepts by only using their textual description. Similarly to human perception, we leverage cross-modal interaction to introduce new concepts.
We show that Knowledge Transfer can successfully introduce novel concepts in multimodal models, in a very efficient manner.
arXiv Detail & Related papers (2024-11-23T17:26:50Z) - Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs.
We propose Compositional Entailment Learning for hyperbolic vision-language models.
Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z) - Language-Informed Visual Concept Learning [22.911347501969857]
We train a set of concept encoders to encode the information pertinent to a set of language-informed concept axes.
We then anchor the concept embeddings to a set of text embeddings obtained from a pre-trained Visual Question Answering (VQA) model.
At inference time, the model extracts concept embeddings along various axes from new test images, which can be remixed to generate images with novel compositions of visual concepts.
arXiv Detail & Related papers (2023-12-06T16:24:47Z) - What do Deck Chairs and Sun Hats Have in Common? Uncovering Shared
Properties in Large Concept Vocabularies [33.879307754303746]
Concepts play a central role in many applications.
Previous work has focused on distilling decontextualised concept embeddings from language models.
We propose a strategy for identifying what different concepts, from a potentially large concept vocabulary, have in common with others.
We then represent concepts in terms of the properties they share with the other concepts.
arXiv Detail & Related papers (2023-10-23T10:53:25Z) - Create Your World: Lifelong Text-to-Image Diffusion [75.14353789007902]
We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts.
In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module.
Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-09-08T16:45:56Z) - Identifying Interpretable Subspaces in Image Representations [54.821222487956355]
We propose a framework to explain features of image representations using Contrasting Concepts (FALCON)
For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset and a pre-trained vision-language model like CLIP.
Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts.
arXiv Detail & Related papers (2023-07-20T00:02:24Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - The Hidden Language of Diffusion Models [70.03691458189604]
We present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model.
We find surprising visual connections between concepts, that transcend their textual semantics.
We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings.
arXiv Detail & Related papers (2023-06-01T17:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.