Aesthetic Visual Question Answering of Photographs
- URL: http://arxiv.org/abs/2208.05798v1
- Date: Wed, 10 Aug 2022 07:27:57 GMT
- Title: Aesthetic Visual Question Answering of Photographs
- Authors: Xin Jin, Wu Zhou, Xinghui Zhou, Shuai Cui, Le Zhang, Jianwen Lv, Shu
Zhao
- Abstract summary: We propose a new task of aesthetic language assessment: aesthetic visual question and answering (AVQA) of images.
The objective QA pairs are generated by the proposed aesthetic attributes analysis algorithms.
We build the first aesthetic visual question answering dataset, AesVQA, that contains 72,168 high-quality images and 324,756 pairs of aesthetic questions.
- Score: 15.83390933825182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aesthetic assessment of images can be categorized into two main forms:
numerical assessment and language assessment. Aesthetics caption of photographs
is the only task of aesthetic language assessment that has been addressed. In
this paper, we propose a new task of aesthetic language assessment: aesthetic
visual question and answering (AVQA) of images. If we give a question of images
aesthetics, model can predict the answer. We use images from
\textit{www.flickr.com}. The objective QA pairs are generated by the proposed
aesthetic attributes analysis algorithms. Moreover, we introduce subjective QA
pairs that are converted from aesthetic numerical labels and sentiment analysis
from large-scale pre-train models. We build the first aesthetic visual question
answering dataset, AesVQA, that contains 72,168 high-quality images and 324,756
pairs of aesthetic questions. Two methods for adjusting the data distribution
have been proposed and proved to improve the accuracy of existing models. This
is the first work that both addresses the task of aesthetic VQA and introduces
subjectiveness into VQA tasks. The experimental results reveal that our methods
outperform other VQA models on this new task.
Related papers
- AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling [11.996211235559866]
Image Content Appeal Assessment (ICAA) is a novel metric that quantifies the level of positive interest an image's content generates for viewers.
ICAA is different from traditional Image-Aesthetics Assessment (IAA), which judges an image's artistic quality.
arXiv Detail & Related papers (2024-07-08T01:40:32Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Image Aesthetics Assessment via Learnable Queries [59.313054821874864]
We propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach.
It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder.
Experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
arXiv Detail & Related papers (2023-09-06T09:42:16Z) - Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation
Evaluation [96.74302670358145]
We introduce an automated method for Visual Concept Evaluation (ViCE) to assess consistency between a generated/edited image and the corresponding prompt/instructions.
ViCE combines the strengths of Large Language Models (LLMs) and Visual Question Answering (VQA) into a unified pipeline, aiming to replicate the human cognitive process in quality assessment.
arXiv Detail & Related papers (2023-07-18T16:33:30Z) - VILA: Learning Image Aesthetics from User Comments with Vision-Language
Pretraining [53.470662123170555]
We propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations.
Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels.
Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset.
arXiv Detail & Related papers (2023-03-24T23:57:28Z) - Understanding Aesthetics with Language: A Photo Critique Dataset for
Aesthetic Assessment [6.201485014848172]
We propose the Critique Photo Reddit dataset (RPCD), which contains 74K images and 220K comments.
We exploit the polarity of the sentiment of criticism as an indicator of aesthetic judgment.
arXiv Detail & Related papers (2022-06-17T08:16:20Z) - Confusing Image Quality Assessment: Towards Better Augmented Reality
Experience [96.29124666702566]
We consider AR technology as the superimposition of virtual scenes and real scenes, and introduce visual confusion as its basic theory.
A ConFusing Image Quality Assessment (CFIQA) database is established, which includes 600 reference images and 300 distorted images generated by mixing reference images in pairs.
An objective metric termed CFIQA is also proposed to better evaluate the confusing image quality.
arXiv Detail & Related papers (2022-04-11T07:03:06Z) - COIN: Counterfactual Image Generation for VQA Interpretation [5.994412766684842]
We introduce an interpretability approach for VQA models by generating counterfactual images.
In addition to interpreting the result of VQA models on single images, the obtained results and the discussion provides an extensive explanation of VQA models' behaviour.
arXiv Detail & Related papers (2022-01-10T13:51:35Z) - User-Guided Personalized Image Aesthetic Assessment based on Deep
Reinforcement Learning [64.07820203919283]
We propose a novel user-guided personalized image aesthetic assessment framework.
It leverages user interactions to retouch and rank images for aesthetic assessment based on deep reinforcement learning (DRL)
It generates personalized aesthetic distribution that is more in line with the aesthetic preferences of different users.
arXiv Detail & Related papers (2021-06-14T15:19:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.