StyleBabel: Artistic Style Tagging and Captioning
- URL: http://arxiv.org/abs/2203.05321v2
- Date: Fri, 11 Mar 2022 08:51:33 GMT
- Title: StyleBabel: Artistic Style Tagging and Captioning
- Authors: Dan Ruta, Andrew Gilbert, Pranav Aggarwal, Naveen Marri, Ajinkya Kale,
Jo Briggs, Chris Speed, Hailin Jin, Baldo Faieta, Alex Filipkowski, Zhe Lin,
John Collomosse
- Abstract summary: We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks.
- Score: 38.792350870518504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present StyleBabel, a unique open access dataset of natural language
captions and free-form tags describing the artistic style of over 135K digital
artworks, collected via a novel participatory method from experts studying at
specialist art and design schools. StyleBabel was collected via an iterative
method, inspired by `Grounded Theory': a qualitative approach that enables
annotation while co-evolving a shared language for fine-grained artistic style
attribute description. We demonstrate several downstream tasks for StyleBabel,
adapting the recent ALADIN architecture for fine-grained style similarity, to
train cross-modal embeddings for: 1) free-form tag generation; 2) natural
language description of artistic style; 3) fine-grained text search of style.
To do so, we extend ALADIN with recent advances in Visual Transformer (ViT) and
cross-modal representation learning, achieving a state of the art accuracy in
fine-grained style retrieval.
Related papers
- Bridging Text and Image for Artist Style Transfer via Contrastive Learning [21.962361974579036]
We propose a Contrastive Learning for Artistic Style Transfer (CLAST) to control arbitrary style transfer.
We introduce a supervised contrastive training strategy to effectively extract style descriptions from the image-text model.
We also propose a novel and efficient adaLN based state space models that explore style-content fusion.
arXiv Detail & Related papers (2024-10-12T15:27:57Z) - Visual Captioning at Will: Describing Images and Videos Guided by a Few
Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference.
We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z) - ALADIN-NST: Self-supervised disentangled representation learning of
artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z) - StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized
Tokenizer of a Large-Scale Generative Model [64.26721402514957]
We propose StylerDALLE, a style transfer method that uses natural language to describe abstract art styles.
Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation.
To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision.
arXiv Detail & Related papers (2023-03-16T12:44:44Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - Name Your Style: An Arbitrary Artist-aware Image Style Transfer [38.41608300670523]
We propose a text-driven image style transfer (TxST) that leverages advanced image-text encoders to control arbitrary style transfer.
We introduce a contrastive training strategy to effectively extract style descriptions from the image-text model.
We also propose a novel and efficient attention module that explores cross-attentions to fuse style and content features.
arXiv Detail & Related papers (2022-02-28T06:21:38Z) - Language-Driven Image Style Transfer [72.36790598245096]
We introduce a new task -- language-driven image style transfer (textttLDIST) -- to manipulate the style of a content image, guided by a text.
The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Experiments show that our CLVA is effective and achieves superb transferred results on textttLDIST.
arXiv Detail & Related papers (2021-06-01T01:58:50Z) - ST$^2$: Small-data Text Style Transfer via Multi-task Meta-Learning [14.271083093944753]
Text style transfer aims to paraphrase a sentence in one style into another while preserving content.
Due to lack of parallel training data, state-of-art methods are unsupervised and rely on large datasets that share content.
In this work, we develop a meta-learning framework to transfer between any kind of text styles.
arXiv Detail & Related papers (2020-04-24T13:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.