VILA: Learning Image Aesthetics from User Comments with Vision-Language
Pretraining
- URL: http://arxiv.org/abs/2303.14302v2
- Date: Fri, 2 Jun 2023 18:57:30 GMT
- Title: VILA: Learning Image Aesthetics from User Comments with Vision-Language
Pretraining
- Authors: Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, Feng Yang
- Abstract summary: We propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations.
Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels.
Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset.
- Score: 53.470662123170555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assessing the aesthetics of an image is challenging, as it is influenced by
multiple factors including composition, color, style, and high-level semantics.
Existing image aesthetic assessment (IAA) methods primarily rely on
human-labeled rating scores, which oversimplify the visual aesthetic
information that humans perceive. Conversely, user comments offer more
comprehensive information and are a more natural way to express human opinions
and preferences regarding image aesthetics. In light of this, we propose
learning image aesthetics from user comments, and exploring vision-language
pretraining methods to learn multimodal aesthetic representations.
Specifically, we pretrain an image-text encoder-decoder model with
image-comment pairs, using contrastive and generative objectives to learn rich
and generic aesthetic semantics without human labels. To efficiently adapt the
pretrained model for downstream IAA tasks, we further propose a lightweight
rank-based adapter that employs text as an anchor to learn the aesthetic
ranking concept. Our results show that our pretrained aesthetic vision-language
model outperforms prior works on image aesthetic captioning over the
AVA-Captions dataset, and it has powerful zero-shot capability for aesthetic
tasks such as zero-shot style classification and zero-shot IAA, surpassing many
supervised baselines. With only minimal finetuning parameters using the
proposed adapter module, our model achieves state-of-the-art IAA performance
over the AVA dataset.
Related papers
- Unveiling The Factors of Aesthetic Preferences with Explainable AI [0.0]
In this study, we pioneer a novel perspective by utilizing several different machine learning (ML) models.
Our models process these attributes as inputs to predict the aesthetic scores of images.
Our aim is to shed light on the complex nature of aesthetic preferences in images through ML and to provide a deeper understanding of the attributes that influence aesthetic judgements.
arXiv Detail & Related papers (2023-11-24T11:06:22Z) - Image Aesthetics Assessment via Learnable Queries [59.313054821874864]
We propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach.
It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder.
Experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
arXiv Detail & Related papers (2023-09-06T09:42:16Z) - Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and
a New Method [64.40494830113286]
We first introduce a large-scale AIAA dataset: Boldbrush Artistic Image dataset (BAID), which consists of 60,337 artistic images covering various art forms.
We then propose a new method, SAAN, which can effectively extract and utilize style-specific and generic aesthetic information to evaluate artistic images.
Experiments demonstrate that our proposed approach outperforms existing IAA methods on the proposed BAID dataset.
arXiv Detail & Related papers (2023-03-27T12:59:15Z) - Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2 [65.5524793975387]
We construct a novel dataset, named DPC-CaptionsV2, by a semi-automatic way.
Images of DPC-CaptionsV2 contain comments up to 4 aesthetic attributes: composition, lighting, color, and subject.
Our method can predict the comments on 4 aesthetic attributes, which are closer to aesthetic topics than those produced by the previous AMAN model.
arXiv Detail & Related papers (2022-08-09T03:20:59Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Personalized Image Aesthetics Assessment with Rich Attributes [35.61053167813472]
We conduct the most comprehensive subjective study of personalized image aesthetics and introduce a new personalized image Aesthetics database with Rich Attributes (PARA)
PARA features wealthy annotations, including 9 image-oriented objective attributes and 4 human-oriented subjective attributes.
We also propose a conditional PIAA model by utilizing subject information as conditional prior.
arXiv Detail & Related papers (2022-03-31T02:23:46Z) - User-Guided Personalized Image Aesthetic Assessment based on Deep
Reinforcement Learning [64.07820203919283]
We propose a novel user-guided personalized image aesthetic assessment framework.
It leverages user interactions to retouch and rank images for aesthetic assessment based on deep reinforcement learning (DRL)
It generates personalized aesthetic distribution that is more in line with the aesthetic preferences of different users.
arXiv Detail & Related papers (2021-06-14T15:19:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.