Have Large Vision-Language Models Mastered Art History?
- URL: http://arxiv.org/abs/2409.03521v1
- Date: Thu, 5 Sep 2024 13:33:57 GMT
- Title: Have Large Vision-Language Models Mastered Art History?
- Authors: Ombretta Strafforello, Derya Soydaner, Michiel Willems, Anne-Sofie Maerten, Stefanie De Winter,
- Abstract summary: Art historians have long studied the unique aspects of artworks, with style prediction being a crucial component of their discipline.
This paper investigates whether large Vision-Language Models, which integrate visual and textual data, can effectively predict the art historical attributes of paintings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of large Vision-Language Models (VLMs) has recently established new baselines in image classification across multiple domains. However, the performance of VLMs in the specific task of artwork classification, particularly art style classification of paintings - a domain traditionally mastered by art historians - has not been explored yet. Artworks pose a unique challenge compared to natural images due to their inherently complex and diverse structures, characterized by variable compositions and styles. Art historians have long studied the unique aspects of artworks, with style prediction being a crucial component of their discipline. This paper investigates whether large VLMs, which integrate visual and textual data, can effectively predict the art historical attributes of paintings. We conduct an in-depth analysis of four VLMs, namely CLIP, LLaVA, OpenFlamingo, and GPT-4o, focusing on zero-shot classification of art style, author and time period using two public benchmarks of artworks. Additionally, we present ArTest, a well-curated test set of artworks, including pivotal paintings studied by art historians.
Related papers
- GalleryGPT: Analyzing Paintings with Large Multimodal Models [64.98398357569765]
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability.
Previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI.
We introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture.
arXiv Detail & Related papers (2024-08-01T11:52:56Z) - Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models [47.19481598385283]
ArtSavant is a tool to determine the unique style of an artist by comparing it to a reference dataset of works from WikiArt.
We then perform a large-scale empirical study to provide quantitative insight on the prevalence of artistic style copying across 3 popular text-to-image generative models.
arXiv Detail & Related papers (2024-04-11T17:59:43Z) - Learning to Evaluate the Artness of AI-generated Images [64.48229009396186]
ArtScore is a metric designed to evaluate the degree to which an image resembles authentic artworks by artists.
We employ pre-trained models for photo and artwork generation, resulting in a series of mixed models.
This dataset is then employed to train a neural network that learns to estimate quantized artness levels of arbitrary images.
arXiv Detail & Related papers (2023-05-08T17:58:27Z) - Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and
a New Method [64.40494830113286]
We first introduce a large-scale AIAA dataset: Boldbrush Artistic Image dataset (BAID), which consists of 60,337 artistic images covering various art forms.
We then propose a new method, SAAN, which can effectively extract and utilize style-specific and generic aesthetic information to evaluate artistic images.
Experiments demonstrate that our proposed approach outperforms existing IAA methods on the proposed BAID dataset.
arXiv Detail & Related papers (2023-03-27T12:59:15Z) - Towards mapping the contemporary art world with ArtLM: an art-specific
NLP model [0.0]
We present a generic Natural Language Processing framework (called ArtLM) to discover the connections among contemporary artists based on their biographies.
With extensive experiments, we demonstrate that our ArtLM achieves 85.6% accuracy and 84.0% F1 score.
We also provide a visualisation and a qualitative analysis of the artist network built from ArtLM's outputs.
arXiv Detail & Related papers (2022-12-14T09:26:07Z) - Inversion-Based Style Transfer with Diffusion Models [78.93863016223858]
Previous arbitrary example-guided artistic image generation methods often fail to control shape changes or convey elements.
We propose an inversion-based style transfer method (InST), which can efficiently and accurately learn the key information of an image.
arXiv Detail & Related papers (2022-11-23T18:44:25Z) - Docent: A content-based recommendation system to discover contemporary
art [0.8782885374383763]
We present a content-based recommendation system on contemporary art relying on images of artworks and contextual metadata of artists.
We gathered and annotated artworks with advanced and art-specific information to create a unique database that was used to train our models.
After an assessment by a team of art specialists, we get an average final rating of 75% of meaningful artworks.
arXiv Detail & Related papers (2022-07-12T16:26:27Z) - Art Creation with Multi-Conditional StyleGANs [81.72047414190482]
A human artist needs a combination of unique skills, understanding, and genuine intention to create artworks that evoke deep feelings and emotions.
We introduce a multi-conditional Generative Adversarial Network (GAN) approach trained on large amounts of human paintings to synthesize realistic-looking paintings that emulate human art.
arXiv Detail & Related papers (2022-02-23T20:45:41Z) - Demographic Influences on Contemporary Art with Unsupervised Style
Embeddings [25.107166631583212]
contempArt is a collection of paintings and drawings, a detailed graph network based on social connections on Instagram and additional socio-demographic information.
We evaluate three methods suited for generating unsupervised style embeddings of images and correlate them with the remaining data.
arXiv Detail & Related papers (2020-09-30T10:13:18Z) - Understanding Compositional Structures in Art Historical Images using
Pose and Gaze Priors [20.98603643788824]
Image compositions are useful in analyzing the interactions in an image to study artists and their artworks.
In this work, we attempt to automate this process using the existing state of the art machine learning techniques.
Our approach focuses on two central themes of image composition: (a) detection of action regions and action lines of the artwork; and (b) pose-based segmentation of foreground and background.
arXiv Detail & Related papers (2020-09-08T15:01:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.