Related papers: Context-Infused Visual Grounding for Art

Related papers

ArtContext: Contextualizing Artworks with Open-Access Art History Articles and Wikidata Knowledge through a LoRA-Tuned CLIP Model [3.7333354131478056]
ArtContext is a pipeline for taking a corpus of Open-Access Art History articles and Wikidata Knowledge and annotating Artworks with this information.<n>We show that the new model, PaintingCLIP, which is weakly supervised by the collected corpus, outperforms CLIP and provides context for a given artwork.
arXiv Detail & Related papers (2026-02-11T20:34:32Z)
Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art [61.28133495240179]
We propose a novel task of aesthetics alignment which seeks to align user-specified aesthetics with the T2I generation output. Inspired by how artworks provide an invaluable perspective to approach aesthetics, we codify visual aesthetics using the compositional framework artists employ. We demonstrate that T2I DMs can effectively offer 10 compositional controls through user-specified PoA conditions.
arXiv Detail & Related papers (2025-03-15T06:58:09Z)
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge [50.60063523054282]
We propose a text-to-image generation model trained without access to art-related content. We then introduce a simple yet effective method to learn an art adapter using only a few examples of selected artistic styles.
arXiv Detail & Related papers (2024-11-29T18:59:01Z)
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph [24.586916324061168]
We present KALE Knowledge-Augmented vision-Language model for artwork Elaborations. KALE incorporates the metadata in two ways: firstly as direct textual input, and secondly through a multimodal heterogeneous knowledge graph. Experimental results demonstrate that KALE achieves strong performance over existing state-of-the-art work across several artwork datasets.
arXiv Detail & Related papers (2024-09-17T06:39:18Z)
Decoupled Textual Embeddings for Customized Image Generation [62.98933630971543]
Customized text-to-image generation aims to learn user-specified concepts with a few images. Existing methods usually suffer from overfitting issues and entangle the subject-unrelated information with the learned concept. We propose the DETEX, a novel approach that learns the disentangled concept embedding for flexible customized text-to-image generation.
arXiv Detail & Related papers (2023-12-19T03:32:10Z)
Text-Guided Synthesis of Eulerian Cinemagraphs [81.20353774053768]
We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions. We focus on cinemagraphs of fluid elements, such as flowing rivers, and drifting clouds, which exhibit continuous motion and repetitive textures.
arXiv Detail & Related papers (2023-07-06T17:59:31Z)
Not Only Generative Art: Stable Diffusion for Content-Style Disentanglement in Art Analysis [23.388338598125195]
GOYA is a method that distills the artistic knowledge captured in a recent generative model to disentangle content and style. Experiments show that synthetically generated images sufficiently serve as a proxy of the real distribution of artworks.
arXiv Detail & Related papers (2023-04-20T13:00:46Z)
Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method [64.40494830113286]
We first introduce a large-scale AIAA dataset: Boldbrush Artistic Image dataset (BAID), which consists of 60,337 artistic images covering various art forms. We then propose a new method, SAAN, which can effectively extract and utilize style-specific and generic aesthetic information to evaluate artistic images. Experiments demonstrate that our proposed approach outperforms existing IAA methods on the proposed BAID dataset.
arXiv Detail & Related papers (2023-03-27T12:59:15Z)
Inversion-Based Style Transfer with Diffusion Models [78.93863016223858]
Previous arbitrary example-guided artistic image generation methods often fail to control shape changes or convey elements. We propose an inversion-based style transfer method (InST), which can efficiently and accurately learn the key information of an image.
arXiv Detail & Related papers (2022-11-23T18:44:25Z)
Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts. These models have been trained using text data collected from content-based labelling protocols. We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z)
Context-Aware Image Inpainting with Learned Semantic Priors [100.99543516733341]
We introduce pretext tasks that are semantically meaningful to estimating the missing contents. We propose a context-aware image inpainting model, which adaptively integrates global semantics and local features.
arXiv Detail & Related papers (2021-06-14T08:09:43Z)
Iconographic Image Captioning for Artworks [2.3859169601259342]
This work utilizes a novel large-scale dataset of artwork images annotated with concepts from the Iconclass classification system designed for art and iconography. The annotations are processed into clean textual description to create a dataset suitable for training a deep neural network model on the image captioning task. A transformer-based vision-language pre-trained model is fine-tuned using the artwork image dataset. The quality of the generated captions and the model's capacity to generalize to new data is explored by employing the model on a new collection of paintings and performing an analysis of the relation between commonly generated captions and the artistic genre.
arXiv Detail & Related papers (2021-02-07T23:11:33Z)
ArtEmis: Affective Language for Visual Art [46.643106054408285]
We focus on the affective experience triggered by visual artworks. We ask the annotators to indicate the dominant emotion they feel for a given image. This leads to a rich set of signals for both the objective content and the affective impact of an image.
arXiv Detail & Related papers (2021-01-19T01:03:40Z)
Understanding Compositional Structures in Art Historical Images using Pose and Gaze Priors [20.98603643788824]
Image compositions are useful in analyzing the interactions in an image to study artists and their artworks. In this work, we attempt to automate this process using the existing state of the art machine learning techniques. Our approach focuses on two central themes of image composition: (a) detection of action regions and action lines of the artwork; and (b) pose-based segmentation of foreground and background.
arXiv Detail & Related papers (2020-09-08T15:01:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.