MetaCLUE: Towards Comprehensive Visual Metaphors Research
- URL: http://arxiv.org/abs/2212.09898v3
- Date: Fri, 2 Jun 2023 04:01:00 GMT
- Title: MetaCLUE: Towards Comprehensive Visual Metaphors Research
- Authors: Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit
Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas
Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani
- Abstract summary: We introduce MetaCLUE, a set of vision tasks on visual metaphor.
We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations.
We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.
- Score: 43.604408485890275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creativity is an indispensable part of human cognition and also an inherent
part of how we make sense of the world. Metaphorical abstraction is fundamental
in communicating creative ideas through nuanced relationships between abstract
concepts such as feelings. While computer vision benchmarks and approaches
predominantly focus on understanding and generating literal interpretations of
images, metaphorical comprehension of images remains relatively unexplored.
Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual
metaphor. We also collect high-quality and rich metaphor annotations (abstract
objects, concepts, relationships along with their corresponding object boxes)
as there do not exist any datasets that facilitate the evaluation of these
tasks. We perform a comprehensive analysis of state-of-the-art models in vision
and language based on our annotations, highlighting strengths and weaknesses of
current approaches in visual metaphor Classification, Localization,
Understanding (retrieval, question answering, captioning) and gEneration
(text-to-image synthesis) tasks. We hope this work provides a concrete step
towards developing AI systems with human-like creative capabilities.
Related papers
- CLiC: Concept Learning in Context [54.81654147248919]
This paper builds upon recent advancements in visual concept learning.
It involves acquiring a visual concept from a source image and subsequently applying it to an object in a target image.
To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area.
arXiv Detail & Related papers (2023-11-28T01:33:18Z) - Text-to-Image Generation for Abstract Concepts [76.32278151607763]
We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC)
The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity.
The concept-dependent form is retrieved from an LLM-extracted form pattern set.
arXiv Detail & Related papers (2023-09-26T02:22:39Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create
Visual Metaphors [38.70166865926743]
We propose a new task of generating visual metaphors from linguistic metaphors.
This is a challenging task for diffusion-based text-to-image models, since it requires the ability to model implicit meaning and compositionality.
We create a high-quality dataset containing 6,476 visual metaphors for 1,540 linguistic metaphors and their associated visual elaborations.
arXiv Detail & Related papers (2023-05-24T05:01:10Z) - Semantic Composition in Visually Grounded Language Models [0.0]
We show that visually-grounded language models drastically fail to represent compositional structure.
We introduce WinogroundVQA, a new compositional visual question answering benchmark.
We discuss connections of our work to neuroscience, psycholinguistics, formal semantics, and philosophy.
arXiv Detail & Related papers (2023-05-15T03:19:42Z) - Visual resemblance and communicative context constrain the emergence of
graphical conventions [21.976382800327965]
Drawing provides a versatile medium for communicating about the visual world.
Do viewers understand drawings based solely on their ability to resemble the entities they refer to (i.e., as images)?
Do they understand drawings based on shared but arbitrary associations with these entities (i.e. as symbols)?
arXiv Detail & Related papers (2021-09-17T23:05:36Z) - Constellation: Learning relational abstractions over objects for
compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z) - Metaphor Generation with Conceptual Mappings [58.61307123799594]
We aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs.
We propose to control the generation process by encoding conceptual mappings between cognitive domains.
We show that the unsupervised CM-Lex model is competitive with recent deep learning metaphor generation systems.
arXiv Detail & Related papers (2021-06-02T15:27:05Z) - ArtEmis: Affective Language for Visual Art [46.643106054408285]
We focus on the affective experience triggered by visual artworks.
We ask the annotators to indicate the dominant emotion they feel for a given image.
This leads to a rich set of signals for both the objective content and the affective impact of an image.
arXiv Detail & Related papers (2021-01-19T01:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.