Explain Me the Painting: Multi-Topic Knowledgeable Art Description
Generation
- URL: http://arxiv.org/abs/2109.05743v1
- Date: Mon, 13 Sep 2021 07:08:46 GMT
- Title: Explain Me the Painting: Multi-Topic Knowledgeable Art Description
Generation
- Authors: Zechen Bai, Yuta Nakashima, Noa Garcia
- Abstract summary: This work presents a framework to bring art closer to people by generating comprehensive descriptions of fine-art paintings.
The framework is validated through an exhaustive analysis, both quantitative and qualitative, as well as a comparative human evaluation.
- Score: 26.099306167995376
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Have you ever looked at a painting and wondered what is the story behind it?
This work presents a framework to bring art closer to people by generating
comprehensive descriptions of fine-art paintings. Generating informative
descriptions for artworks, however, is extremely challenging, as it requires to
1) describe multiple aspects of the image such as its style, content, or
composition, and 2) provide background and contextual knowledge about the
artist, their influences, or the historical period. To address these
challenges, we introduce a multi-topic and knowledgeable art description
framework, which modules the generated sentences according to three artistic
topics and, additionally, enhances each description with external knowledge.
The framework is validated through an exhaustive analysis, both quantitative
and qualitative, as well as a comparative human evaluation, demonstrating
outstanding results in terms of both topic diversity and information veracity.
Related papers
- VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models [53.59400446543756]
We introduce a dual-branch and training-free method, namely VitaGlyph, to enable flexible artistic typography.
VitaGlyph treats input character as a scene composed of Subject and Surrounding, followed by rendering them under varying degrees of geometry transformation.
Experimental results demonstrate that VitaGlyph not only achieves better artistry and readability, but also manages to depict multiple customize concepts.
arXiv Detail & Related papers (2024-10-02T16:48:47Z) - KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph [24.586916324061168]
We present KALE Knowledge-Augmented vision-Language model for artwork Elaborations.
KALE incorporates the metadata in two ways: firstly as direct textual input, and secondly through a multimodal heterogeneous knowledge graph.
Experimental results demonstrate that KALE achieves strong performance over existing state-of-the-art work across several artwork datasets.
arXiv Detail & Related papers (2024-09-17T06:39:18Z) - GalleryGPT: Analyzing Paintings with Large Multimodal Models [64.98398357569765]
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability.
Previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI.
We introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture.
arXiv Detail & Related papers (2024-08-01T11:52:56Z) - Impressions: Understanding Visual Semiotics and Aesthetic Impact [66.40617566253404]
We present Impressions, a novel dataset through which to investigate the semiotics of images.
We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images.
This dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
arXiv Detail & Related papers (2023-10-27T04:30:18Z) - Text-Guided Synthesis of Eulerian Cinemagraphs [81.20353774053768]
We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions.
We focus on cinemagraphs of fluid elements, such as flowing rivers, and drifting clouds, which exhibit continuous motion and repetitive textures.
arXiv Detail & Related papers (2023-07-06T17:59:31Z) - Not Only Generative Art: Stable Diffusion for Content-Style
Disentanglement in Art Analysis [23.388338598125195]
GOYA is a method that distills the artistic knowledge captured in a recent generative model to disentangle content and style.
Experiments show that synthetically generated images sufficiently serve as a proxy of the real distribution of artworks.
arXiv Detail & Related papers (2023-04-20T13:00:46Z) - Inversion-Based Style Transfer with Diffusion Models [78.93863016223858]
Previous arbitrary example-guided artistic image generation methods often fail to control shape changes or convey elements.
We propose an inversion-based style transfer method (InST), which can efficiently and accurately learn the key information of an image.
arXiv Detail & Related papers (2022-11-23T18:44:25Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Automatic analysis of artistic paintings using information-based
measures [1.25456674968456]
We identify hidden patterns and relationships present in artistic paintings by analysing their complexity.
We apply Normalized Compression (NC) and the Block Decomposition Method (BDM) to a dataset of 4,266 paintings from 91 authors.
We define a fingerprint that describes critical information regarding the artists' style, their artistic influences, and shared techniques.
arXiv Detail & Related papers (2021-02-02T21:40:30Z) - Understanding Compositional Structures in Art Historical Images using
Pose and Gaze Priors [20.98603643788824]
Image compositions are useful in analyzing the interactions in an image to study artists and their artworks.
In this work, we attempt to automate this process using the existing state of the art machine learning techniques.
Our approach focuses on two central themes of image composition: (a) detection of action regions and action lines of the artwork; and (b) pose-based segmentation of foreground and background.
arXiv Detail & Related papers (2020-09-08T15:01:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.