MetaCLUE: Towards Comprehensive Visual Metaphors Research
- URL: http://arxiv.org/abs/2212.09898v3
- Date: Fri, 2 Jun 2023 04:01:00 GMT
- Title: MetaCLUE: Towards Comprehensive Visual Metaphors Research
- Authors: Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit
Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas
Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani
- Abstract summary: We introduce MetaCLUE, a set of vision tasks on visual metaphor.
We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations.
We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.
- Score: 43.604408485890275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creativity is an indispensable part of human cognition and also an inherent
part of how we make sense of the world. Metaphorical abstraction is fundamental
in communicating creative ideas through nuanced relationships between abstract
concepts such as feelings. While computer vision benchmarks and approaches
predominantly focus on understanding and generating literal interpretations of
images, metaphorical comprehension of images remains relatively unexplored.
Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual
metaphor. We also collect high-quality and rich metaphor annotations (abstract
objects, concepts, relationships along with their corresponding object boxes)
as there do not exist any datasets that facilitate the evaluation of these
tasks. We perform a comprehensive analysis of state-of-the-art models in vision
and language based on our annotations, highlighting strengths and weaknesses of
current approaches in visual metaphor Classification, Localization,
Understanding (retrieval, question answering, captioning) and gEneration
(text-to-image synthesis) tasks. We hope this work provides a concrete step
towards developing AI systems with human-like creative capabilities.
Related papers
- Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs.
We propose Compositional Entailment Learning for hyperbolic vision-language models.
Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z) - What Makes a Maze Look Like a Maze? [92.80800000328277]
We introduce Deep Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning.
At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols.
We show that DSG significantly improves the abstract visual reasoning performance of vision-language models.
arXiv Detail & Related papers (2024-09-12T16:41:47Z) - CLiC: Concept Learning in Context [54.81654147248919]
This paper builds upon recent advancements in visual concept learning.
It involves acquiring a visual concept from a source image and subsequently applying it to an object in a target image.
To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area.
arXiv Detail & Related papers (2023-11-28T01:33:18Z) - Text-to-Image Generation for Abstract Concepts [76.32278151607763]
We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC)
The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity.
The concept-dependent form is retrieved from an LLM-extracted form pattern set.
arXiv Detail & Related papers (2023-09-26T02:22:39Z) - I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create
Visual Metaphors [38.70166865926743]
We propose a new task of generating visual metaphors from linguistic metaphors.
This is a challenging task for diffusion-based text-to-image models, since it requires the ability to model implicit meaning and compositionality.
We create a high-quality dataset containing 6,476 visual metaphors for 1,540 linguistic metaphors and their associated visual elaborations.
arXiv Detail & Related papers (2023-05-24T05:01:10Z) - Semantic Composition in Visually Grounded Language Models [0.0]
We show that visually-grounded language models drastically fail to represent compositional structure.
We introduce WinogroundVQA, a new compositional visual question answering benchmark.
We discuss connections of our work to neuroscience, psycholinguistics, formal semantics, and philosophy.
arXiv Detail & Related papers (2023-05-15T03:19:42Z) - Visual resemblance and communicative context constrain the emergence of
graphical conventions [21.976382800327965]
Drawing provides a versatile medium for communicating about the visual world.
Do viewers understand drawings based solely on their ability to resemble the entities they refer to (i.e., as images)?
Do they understand drawings based on shared but arbitrary associations with these entities (i.e. as symbols)?
arXiv Detail & Related papers (2021-09-17T23:05:36Z) - Constellation: Learning relational abstractions over objects for
compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z) - ArtEmis: Affective Language for Visual Art [46.643106054408285]
We focus on the affective experience triggered by visual artworks.
We ask the annotators to indicate the dominant emotion they feel for a given image.
This leads to a rich set of signals for both the objective content and the affective impact of an image.
arXiv Detail & Related papers (2021-01-19T01:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.