Explain Me the Painting: Multi-Topic Knowledgeable Art Description
Generation
- URL: http://arxiv.org/abs/2109.05743v1
- Date: Mon, 13 Sep 2021 07:08:46 GMT
- Title: Explain Me the Painting: Multi-Topic Knowledgeable Art Description
Generation
- Authors: Zechen Bai, Yuta Nakashima, Noa Garcia
- Abstract summary: This work presents a framework to bring art closer to people by generating comprehensive descriptions of fine-art paintings.
The framework is validated through an exhaustive analysis, both quantitative and qualitative, as well as a comparative human evaluation.
- Score: 26.099306167995376
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Have you ever looked at a painting and wondered what is the story behind it?
This work presents a framework to bring art closer to people by generating
comprehensive descriptions of fine-art paintings. Generating informative
descriptions for artworks, however, is extremely challenging, as it requires to
1) describe multiple aspects of the image such as its style, content, or
composition, and 2) provide background and contextual knowledge about the
artist, their influences, or the historical period. To address these
challenges, we introduce a multi-topic and knowledgeable art description
framework, which modules the generated sentences according to three artistic
topics and, additionally, enhances each description with external knowledge.
The framework is validated through an exhaustive analysis, both quantitative
and qualitative, as well as a comparative human evaluation, demonstrating
outstanding results in terms of both topic diversity and information veracity.
Related papers
- DOCCI: Descriptions of Connected and Contrasting Images [58.377060316967864]
Descriptions of Connected and Contrasting Images (DOCCI) is a dataset with long, human-annotated English descriptions for 15k images.
We instruct human annotators to create comprehensive descriptions for each image.
We show that DOCCI is a useful testbed for text-to-image generation.
arXiv Detail & Related papers (2024-04-30T17:56:24Z) - Impressions: Understanding Visual Semiotics and Aesthetic Impact [66.40617566253404]
We present Impressions, a novel dataset through which to investigate the semiotics of images.
We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images.
This dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
arXiv Detail & Related papers (2023-10-27T04:30:18Z) - ARTxAI: Explainable Artificial Intelligence Curates Deep Representation
Learning for Artistic Images using Fuzzy Techniques [11.286457041998569]
We show how the features obtained from different tasks in artistic image classification are suitable to solve other ones of similar nature.
We propose an explainable artificial intelligence method to map known visual traits of an image with the features used by the deep learning model.
arXiv Detail & Related papers (2023-08-29T13:15:13Z) - Text-Guided Synthesis of Eulerian Cinemagraphs [81.20353774053768]
We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions.
We focus on cinemagraphs of fluid elements, such as flowing rivers, and drifting clouds, which exhibit continuous motion and repetitive textures.
arXiv Detail & Related papers (2023-07-06T17:59:31Z) - Not Only Generative Art: Stable Diffusion for Content-Style
Disentanglement in Art Analysis [23.388338598125195]
GOYA is a method that distills the artistic knowledge captured in a recent generative model to disentangle content and style.
Experiments show that synthetically generated images sufficiently serve as a proxy of the real distribution of artworks.
arXiv Detail & Related papers (2023-04-20T13:00:46Z) - Inversion-Based Style Transfer with Diffusion Models [78.93863016223858]
Previous arbitrary example-guided artistic image generation methods often fail to control shape changes or convey elements.
We propose an inversion-based style transfer method (InST), which can efficiently and accurately learn the key information of an image.
arXiv Detail & Related papers (2022-11-23T18:44:25Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Toward Understanding WordArt: Corner-Guided Transformer for Scene Text
Recognition [63.6608759501803]
We propose to recognize artistic text at three levels.
corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape.
Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification.
Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points.
arXiv Detail & Related papers (2022-07-31T14:11:05Z) - Automatic analysis of artistic paintings using information-based
measures [1.25456674968456]
We identify hidden patterns and relationships present in artistic paintings by analysing their complexity.
We apply Normalized Compression (NC) and the Block Decomposition Method (BDM) to a dataset of 4,266 paintings from 91 authors.
We define a fingerprint that describes critical information regarding the artists' style, their artistic influences, and shared techniques.
arXiv Detail & Related papers (2021-02-02T21:40:30Z) - Understanding Compositional Structures in Art Historical Images using
Pose and Gaze Priors [20.98603643788824]
Image compositions are useful in analyzing the interactions in an image to study artists and their artworks.
In this work, we attempt to automate this process using the existing state of the art machine learning techniques.
Our approach focuses on two central themes of image composition: (a) detection of action regions and action lines of the artwork; and (b) pose-based segmentation of foreground and background.
arXiv Detail & Related papers (2020-09-08T15:01:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.