Formal Analysis of Art: Proxy Learning of Visual Concepts from Style
Through Language Models
- URL: http://arxiv.org/abs/2201.01819v1
- Date: Wed, 5 Jan 2022 21:03:29 GMT
- Title: Formal Analysis of Art: Proxy Learning of Visual Concepts from Style
Through Language Models
- Authors: Diana Kim, Ahmed Elgammal, Marian Mazzone
- Abstract summary: We present a machine learning system that can quantify fine art paintings with a set of visual elements and principles of art.
We introduce a novel mechanism, called proxy learning, which learns visual concepts in paintings though their general relation to styles.
- Score: 10.854399031287393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a machine learning system that can quantify fine art paintings
with a set of visual elements and principles of art. This formal analysis is
fundamental for understanding art, but developing such a system is challenging.
Paintings have high visual complexities, but it is also difficult to collect
enough training data with direct labels. To resolve these practical
limitations, we introduce a novel mechanism, called proxy learning, which
learns visual concepts in paintings though their general relation to styles.
This framework does not require any visual annotation, but only uses style
labels and a general relationship between visual concepts and style. In this
paper, we propose a novel proxy model and reformulate four pre-existing methods
in the context of proxy learning. Through quantitative and qualitative
comparison, we evaluate these methods and compare their effectiveness in
quantifying the artistic visual concepts, where the general relationship is
estimated by language models; GloVe or BERT. The language modeling is a
practical and scalable solution requiring no labeling, but it is inevitably
imperfect. We demonstrate how the new proxy model is robust to the
imperfection, while the other models are sensitively affected by it.
Related papers
- KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph [24.586916324061168]
We present KALE Knowledge-Augmented vision-Language model for artwork Elaborations.
KALE incorporates the metadata in two ways: firstly as direct textual input, and secondly through a multimodal heterogeneous knowledge graph.
Experimental results demonstrate that KALE achieves strong performance over existing state-of-the-art work across several artwork datasets.
arXiv Detail & Related papers (2024-09-17T06:39:18Z) - Information Theoretic Text-to-Image Alignment [49.396917351264655]
We present a novel method that relies on an information-theoretic alignment measure to steer image generation.
Our method is on-par or superior to the state-of-the-art, yet requires nothing but a pre-trained denoising network to estimate MI.
arXiv Detail & Related papers (2024-05-31T12:20:02Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Not Only Generative Art: Stable Diffusion for Content-Style
Disentanglement in Art Analysis [23.388338598125195]
GOYA is a method that distills the artistic knowledge captured in a recent generative model to disentangle content and style.
Experiments show that synthetically generated images sufficiently serve as a proxy of the real distribution of artworks.
arXiv Detail & Related papers (2023-04-20T13:00:46Z) - Inching Towards Automated Understanding of the Meaning of Art: An
Application to Computational Analysis of Mondrian's Artwork [0.0]
This paper attempts to identify capabilities that are related to semantic processing.
The proposed methodology identifies the missing capabilities by comparing the process of understanding Mondrian's paintings with the process of understanding electronic circuit designs.
To explain the usefulness of the methodology, the paper discusses a new, three-step computational method to distinguish Mondrian's paintings from other artwork.
arXiv Detail & Related papers (2022-12-29T23:34:19Z) - Inversion-Based Style Transfer with Diffusion Models [78.93863016223858]
Previous arbitrary example-guided artistic image generation methods often fail to control shape changes or convey elements.
We propose an inversion-based style transfer method (InST), which can efficiently and accurately learn the key information of an image.
arXiv Detail & Related papers (2022-11-23T18:44:25Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z) - Rethinking Class Relations: Absolute-relative Supervised and
Unsupervised Few-shot Learning [157.62595449130973]
We study the fundamental problem of simplistic class modeling in current few-shot learning methods.
We propose a novel Absolute-relative Learning paradigm to fully take advantage of label information to refine the image representations.
arXiv Detail & Related papers (2020-01-12T12:25:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.