Related papers: MetaCLUE: Towards Comprehensive Visual Metaphors Research

MetaCLUE: Towards Comprehensive Visual Metaphors Research

URL: http://arxiv.org/abs/2212.09898v3
Date: Fri, 2 Jun 2023 04:01:00 GMT
Title: MetaCLUE: Towards Comprehensive Visual Metaphors Research
Authors: Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani
Abstract summary: We introduce MetaCLUE, a set of vision tasks on visual metaphor. We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations. We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.
Score: 43.604408485890275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, metaphorical comprehension of images remains relatively unexplored. Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor. We also collect high-quality and rich metaphor annotations (abstract objects, concepts, relationships along with their corresponding object boxes) as there do not exist any datasets that facilitate the evaluation of these tasks. We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations, highlighting strengths and weaknesses of current approaches in visual metaphor Classification, Localization, Understanding (retrieval, question answering, captioning) and gEneration (text-to-image synthesis) tasks. We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.

Related papers

Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs. We propose Compositional Entailment Learning for hyperbolic vision-language models. Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z)
What Makes a Maze Look Like a Maze? [92.80800000328277]
We introduce Deep Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning. At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols. We show that DSG significantly improves the abstract visual reasoning performance of vision-language models.
arXiv Detail & Related papers (2024-09-12T16:41:47Z)
The Language of Infographics: Toward Understanding Conceptual Metaphor Use in Scientific Storytelling [9.302187675469554]
We map Conceptual Metaphor (CMT) to the visualization domain to address patterns of visual conceptual metaphors that are often used in science infographics. Our analysis shows that ontological and orientational conceptual metaphors are the most widely applied to translate complex scientific concepts.
arXiv Detail & Related papers (2024-07-18T11:39:50Z)
CLiC: Concept Learning in Context [54.81654147248919]
This paper builds upon recent advancements in visual concept learning. It involves acquiring a visual concept from a source image and subsequently applying it to an object in a target image. To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area.
arXiv Detail & Related papers (2023-11-28T01:33:18Z)
Text-to-Image Generation for Abstract Concepts [76.32278151607763]
We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC) The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity. The concept-dependent form is retrieved from an LLM-extracted form pattern set.
arXiv Detail & Related papers (2023-09-26T02:22:39Z)
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors [38.70166865926743]
We propose a new task of generating visual metaphors from linguistic metaphors. This is a challenging task for diffusion-based text-to-image models, since it requires the ability to model implicit meaning and compositionality. We create a high-quality dataset containing 6,476 visual metaphors for 1,540 linguistic metaphors and their associated visual elaborations.
arXiv Detail & Related papers (2023-05-24T05:01:10Z)
Semantic Composition in Visually Grounded Language Models [0.0]
We show that visually-grounded language models drastically fail to represent compositional structure. We introduce WinogroundVQA, a new compositional visual question answering benchmark. We discuss connections of our work to neuroscience, psycholinguistics, formal semantics, and philosophy.
arXiv Detail & Related papers (2023-05-15T03:19:42Z)
Visual resemblance and communicative context constrain the emergence of graphical conventions [21.976382800327965]
Drawing provides a versatile medium for communicating about the visual world. Do viewers understand drawings based solely on their ability to resemble the entities they refer to (i.e., as images)? Do they understand drawings based on shared but arbitrary associations with these entities (i.e. as symbols)?
arXiv Detail & Related papers (2021-09-17T23:05:36Z)
Constellation: Learning relational abstractions over objects for compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes. This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.