Related papers: ArtContext: Contextualizing Artworks with Open-Access Art History Articles and Wikidata Knowledge through a LoRA-Tuned CLIP Model

ArtContext: Contextualizing Artworks with Open-Access Art History Articles and Wikidata Knowledge through a LoRA-Tuned CLIP Model

URL: http://arxiv.org/abs/2602.11349v1
Date: Wed, 11 Feb 2026 20:34:32 GMT
Title: ArtContext: Contextualizing Artworks with Open-Access Art History Articles and Wikidata Knowledge through a LoRA-Tuned CLIP Model
Authors: Samuel Waugh, Stuart James,
Abstract summary: ArtContext is a pipeline for taking a corpus of Open-Access Art History articles and Wikidata Knowledge and annotating Artworks with this information.<n>We show that the new model, PaintingCLIP, which is weakly supervised by the collected corpus, outperforms CLIP and provides context for a given artwork.
Score: 3.7333354131478056
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Many Art History articles discuss artworks in general as well as specific parts of works, such as layout, iconography, or material culture. However, when viewing an artwork, it is not trivial to identify what different articles have said about the piece. Therefore, we propose ArtContext, a pipeline for taking a corpus of Open-Access Art History articles and Wikidata Knowledge and annotating Artworks with this information. We do this using a novel corpus collection pipeline, then learn a bespoke CLIP model adapted using Low-Rank Adaptation (LoRA) to make it domain-specific. We show that the new model, PaintingCLIP, which is weakly supervised by the collected corpus, outperforms CLIP and provides context for a given artwork. The proposed pipeline is generalisable and can be readily applied to numerous humanities areas.

Related papers

ArtSeek: Deep artwork understanding via multimodal in-context reasoning and late interaction retrieval [8.94249680213101]
ArtSeek is a framework for art analysis that combines multimodal large language models with retrieval-augmented generation.<n>ArtSeek integrates three key components: an intelligent multimodal retrieval module based on late interaction retrieval, a contrastive multitask classification network for predicting artist, genre, style, media, and tags, and an agentic reasoning strategy.<n>Our framework achieves state-of-the-art results on multiple benchmarks, including a +8.4% F1 improvement in style classification over GraphCLIP and a +7.1 BLEU@1 gain in captioning on ArtPedia.
arXiv Detail & Related papers (2025-07-29T15:31:58Z)
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding [21.469839845171652]
ArtRAG is a novel framework that combines structured knowledge with retrieval-augmented generation (RAG) for multi-perspective artwork explanation.<n>At inference time, a structured retriever selects semantically and topologically relevant subgraphs to guide generation.<n>Experiments on the SemArt and Artpedia datasets show that ArtRAG outperforms several heavily trained baselines.
arXiv Detail & Related papers (2025-05-09T13:08:27Z)
ArtistAuditor: Auditing Artist Style Pirate in Text-to-Image Generation Models [61.55816738318699]
We propose a novel method for data-use auditing in the text-to-image generation model.<n>ArtistAuditor employs a style extractor to obtain the multi-granularity style representations and treats artworks as samplings of an artist's style.<n>The experimental results on six combinations of models and datasets show that ArtistAuditor can achieve high AUC values.
arXiv Detail & Related papers (2025-04-17T16:15:38Z)
Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art [61.28133495240179]
We propose a novel task of aesthetics alignment which seeks to align user-specified aesthetics with the T2I generation output.<n>Inspired by how artworks provide an invaluable perspective to approach aesthetics, we codify visual aesthetics using the compositional framework artists employ.<n>We demonstrate that T2I DMs can effectively offer 10 compositional controls through user-specified PoA conditions.
arXiv Detail & Related papers (2025-03-15T06:58:09Z)
Recognizing Artistic Style of Archaeological Image Fragments Using Deep Style Extrapolation [2.7233796151875245]
Ancient artworks obtained in archaeological excavations usually suffer from a certain degree of fragmentation and physical degradation.<n>In this work, we present a generalized deep-learning framework for predicting the artistic style of image fragments.
arXiv Detail & Related papers (2025-01-01T13:38:15Z)
Opt-In Art: Learning Art Styles Only from Few Examples [50.60063523054282]
We show that it is possible to adapt a model trained without paintings to an artistic style, given only few examples.<n>Surprisingly, our findings suggest that high-quality artistic outputs can be achieved without prior exposure to artistic data.
arXiv Detail & Related papers (2024-11-29T18:59:01Z)
Context-Infused Visual Grounding for Art [6.748153937479316]
We present CIGAr (Context-Infused GroundingDINO for Art), a visual grounding approach which utilises the artwork descriptions during training as context. In addition, we present a new dataset, Ukiyo-eVG, with manually annotated phrase-grounding annotations.
arXiv Detail & Related papers (2024-10-16T08:41:19Z)
GalleryGPT: Analyzing Paintings with Large Multimodal Models [64.98398357569765]
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI. We introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture.
arXiv Detail & Related papers (2024-08-01T11:52:56Z)
Visually-Aware Context Modeling for News Image Captioning [54.31708859631821]
News Image Captioning aims to create captions from news articles and images. We propose a face-naming module for learning better name embeddings. We use CLIP to retrieve sentences that are semantically close to the image.
arXiv Detail & Related papers (2023-08-16T12:39:39Z)
Towards mapping the contemporary art world with ArtLM: an art-specific NLP model [0.0]
We present a generic Natural Language Processing framework (called ArtLM) to discover the connections among contemporary artists based on their biographies. With extensive experiments, we demonstrate that our ArtLM achieves 85.6% accuracy and 84.0% F1 score. We also provide a visualisation and a qualitative analysis of the artist network built from ArtLM's outputs.
arXiv Detail & Related papers (2022-12-14T09:26:07Z)
The Curious Layperson: Fine-Grained Image Recognition without Expert Labels [90.88501867321573]
We consider a new problem: fine-grained image recognition without expert annotations. We learn a model to describe the visual appearance of objects using non-expert image descriptions. We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis.
arXiv Detail & Related papers (2021-11-05T17:58:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.