Related papers: Have Large Vision-Language Models Mastered Art History?

Have Large Vision-Language Models Mastered Art History?

URL: http://arxiv.org/abs/2409.03521v1
Date: Thu, 5 Sep 2024 13:33:57 GMT
Title: Have Large Vision-Language Models Mastered Art History?
Authors: Ombretta Strafforello, Derya Soydaner, Michiel Willems, Anne-Sofie Maerten, Stefanie De Winter,
Abstract summary: Art historians have long studied the unique aspects of artworks, with style prediction being a crucial component of their discipline. This paper investigates whether large Vision-Language Models, which integrate visual and textual data, can effectively predict the art historical attributes of paintings.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The emergence of large Vision-Language Models (VLMs) has recently established new baselines in image classification across multiple domains. However, the performance of VLMs in the specific task of artwork classification, particularly art style classification of paintings - a domain traditionally mastered by art historians - has not been explored yet. Artworks pose a unique challenge compared to natural images due to their inherently complex and diverse structures, characterized by variable compositions and styles. Art historians have long studied the unique aspects of artworks, with style prediction being a crucial component of their discipline. This paper investigates whether large VLMs, which integrate visual and textual data, can effectively predict the art historical attributes of paintings. We conduct an in-depth analysis of four VLMs, namely CLIP, LLaVA, OpenFlamingo, and GPT-4o, focusing on zero-shot classification of art style, author and time period using two public benchmarks of artworks. Additionally, we present ArTest, a well-curated test set of artworks, including pivotal paintings studied by art historians.

Related papers

ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding [16.9945713458689]
ArtRAG is a novel framework that combines structured knowledge with retrieval-augmented generation (RAG) for multi-perspective artwork explanation.<n>At inference time, a structured retriever selects semantically and topologically relevant subgraphs to guide generation.<n>Experiments on the SemArt and Artpedia datasets show that ArtRAG outperforms several heavily trained baselines.
arXiv Detail & Related papers (2025-05-09T13:08:27Z)
ArtistAuditor: Auditing Artist Style Pirate in Text-to-Image Generation Models [61.55816738318699]
We propose a novel method for data-use auditing in the text-to-image generation model. ArtistAuditor employs a style extractor to obtain the multi-granularity style representations and treats artworks as samplings of an artist's style. The experimental results on six combinations of models and datasets show that ArtistAuditor can achieve high AUC values.
arXiv Detail & Related papers (2025-04-17T16:15:38Z)
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge [50.60063523054282]
We propose a text-to-image generation model trained without access to art-related content. We then introduce a simple yet effective method to learn an art adapter using only a few examples of selected artistic styles.
arXiv Detail & Related papers (2024-11-29T18:59:01Z)
APDDv2: Aesthetics of Paintings and Drawings Dataset with Artist Labeled Scores and Comments [45.57709215036539]
We introduce the Aesthetics Paintings and Drawings dataset (APDD), the first comprehensive collection of paintings encompassing 24 distinct artistic categories and 10 aesthetic attributes. APDDv2 boasts an expanded image corpus and improved annotation quality, featuring detailed language comments. We present an updated version of the Art Assessment Network for Specific Painting Styles, denoted as ArtCLIP. Experimental validation demonstrates the superior performance of this revised model in the realm of aesthetic evaluation, surpassing its predecessor in accuracy and efficacy.
arXiv Detail & Related papers (2024-11-13T11:46:42Z)
GalleryGPT: Analyzing Paintings with Large Multimodal Models [64.98398357569765]
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI. We introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture.
arXiv Detail & Related papers (2024-08-01T11:52:56Z)
Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models [47.19481598385283]
ArtSavant is a tool to determine the unique style of an artist by comparing it to a reference dataset of works from WikiArt. We then perform a large-scale empirical study to provide quantitative insight on the prevalence of artistic style copying across 3 popular text-to-image generative models.
arXiv Detail & Related papers (2024-04-11T17:59:43Z)
Learning to Evaluate the Artness of AI-generated Images [64.48229009396186]
ArtScore is a metric designed to evaluate the degree to which an image resembles authentic artworks by artists. We employ pre-trained models for photo and artwork generation, resulting in a series of mixed models. This dataset is then employed to train a neural network that learns to estimate quantized artness levels of arbitrary images.
arXiv Detail & Related papers (2023-05-08T17:58:27Z)
Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method [64.40494830113286]
We first introduce a large-scale AIAA dataset: Boldbrush Artistic Image dataset (BAID), which consists of 60,337 artistic images covering various art forms. We then propose a new method, SAAN, which can effectively extract and utilize style-specific and generic aesthetic information to evaluate artistic images. Experiments demonstrate that our proposed approach outperforms existing IAA methods on the proposed BAID dataset.
arXiv Detail & Related papers (2023-03-27T12:59:15Z)
Towards mapping the contemporary art world with ArtLM: an art-specific NLP model [0.0]
We present a generic Natural Language Processing framework (called ArtLM) to discover the connections among contemporary artists based on their biographies. With extensive experiments, we demonstrate that our ArtLM achieves 85.6% accuracy and 84.0% F1 score. We also provide a visualisation and a qualitative analysis of the artist network built from ArtLM's outputs.
arXiv Detail & Related papers (2022-12-14T09:26:07Z)
Inversion-Based Style Transfer with Diffusion Models [78.93863016223858]
Previous arbitrary example-guided artistic image generation methods often fail to control shape changes or convey elements. We propose an inversion-based style transfer method (InST), which can efficiently and accurately learn the key information of an image.
arXiv Detail & Related papers (2022-11-23T18:44:25Z)
Docent: A content-based recommendation system to discover contemporary art [0.8782885374383763]
We present a content-based recommendation system on contemporary art relying on images of artworks and contextual metadata of artists. We gathered and annotated artworks with advanced and art-specific information to create a unique database that was used to train our models. After an assessment by a team of art specialists, we get an average final rating of 75% of meaningful artworks.
arXiv Detail & Related papers (2022-07-12T16:26:27Z)
Demographic Influences on Contemporary Art with Unsupervised Style Embeddings [25.107166631583212]
contempArt is a collection of paintings and drawings, a detailed graph network based on social connections on Instagram and additional socio-demographic information. We evaluate three methods suited for generating unsupervised style embeddings of images and correlate them with the remaining data.
arXiv Detail & Related papers (2020-09-30T10:13:18Z)
Understanding Compositional Structures in Art Historical Images using Pose and Gaze Priors [20.98603643788824]
Image compositions are useful in analyzing the interactions in an image to study artists and their artworks. In this work, we attempt to automate this process using the existing state of the art machine learning techniques. Our approach focuses on two central themes of image composition: (a) detection of action regions and action lines of the artwork; and (b) pose-based segmentation of foreground and background.
arXiv Detail & Related papers (2020-09-08T15:01:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.