Aesthetically Relevant Image Captioning
- URL: http://arxiv.org/abs/2211.15378v1
- Date: Fri, 25 Nov 2022 14:28:10 GMT
- Title: Aesthetically Relevant Image Captioning
- Authors: Zhipeng Zhong, Fei Zhou and Guoping Qiu
- Abstract summary: We study image AQA and IAC together and present a new IAC method termed Aesthetically Relevant Image Captioning (ARIC)
ARIC includes an ARS weighted IAC loss function and an ARS based diverse aesthetic caption selector (DACS)
We show that texts with higher ARS's can predict the aesthetic ratings more accurately and that the new ARIC model can generate more accurate, aesthetically more relevant and more diverse image captions.
- Score: 17.081262827258943
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Image aesthetic quality assessment (AQA) aims to assign numerical aesthetic
ratings to images whilst image aesthetic captioning (IAC) aims to generate
textual descriptions of the aesthetic aspects of images. In this paper, we
study image AQA and IAC together and present a new IAC method termed
Aesthetically Relevant Image Captioning (ARIC). Based on the observation that
most textual comments of an image are about objects and their interactions
rather than aspects of aesthetics, we first introduce the concept of Aesthetic
Relevance Score (ARS) of a sentence and have developed a model to automatically
label a sentence with its ARS. We then use the ARS to design the ARIC model
which includes an ARS weighted IAC loss function and an ARS based diverse
aesthetic caption selector (DACS). We present extensive experimental results to
show the soundness of the ARS concept and the effectiveness of the ARIC model
by demonstrating that texts with higher ARS's can predict the aesthetic ratings
more accurately and that the new ARIC model can generate more accurate,
aesthetically more relevant and more diverse image captions. Furthermore, a
large new research database containing 510K images with over 5 million comments
and 350K aesthetic scores, and code for implementing ARIC are available at
https://github.com/PengZai/ARIC.
Related papers
- AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling [11.996211235559866]
Image Content Appeal Assessment (ICAA) is a novel metric that quantifies the level of positive interest an image's content generates for viewers.
ICAA is different from traditional Image-Aesthetics Assessment (IAA), which judges an image's artistic quality.
arXiv Detail & Related papers (2024-07-08T01:40:32Z) - Image Aesthetics Assessment via Learnable Queries [59.313054821874864]
We propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach.
It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder.
Experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
arXiv Detail & Related papers (2023-09-06T09:42:16Z) - Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and
a New Method [64.40494830113286]
We first introduce a large-scale AIAA dataset: Boldbrush Artistic Image dataset (BAID), which consists of 60,337 artistic images covering various art forms.
We then propose a new method, SAAN, which can effectively extract and utilize style-specific and generic aesthetic information to evaluate artistic images.
Experiments demonstrate that our proposed approach outperforms existing IAA methods on the proposed BAID dataset.
arXiv Detail & Related papers (2023-03-27T12:59:15Z) - VILA: Learning Image Aesthetics from User Comments with Vision-Language
Pretraining [53.470662123170555]
We propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations.
Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels.
Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset.
arXiv Detail & Related papers (2023-03-24T23:57:28Z) - Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2 [65.5524793975387]
We construct a novel dataset, named DPC-CaptionsV2, by a semi-automatic way.
Images of DPC-CaptionsV2 contain comments up to 4 aesthetic attributes: composition, lighting, color, and subject.
Our method can predict the comments on 4 aesthetic attributes, which are closer to aesthetic topics than those produced by the previous AMAN model.
arXiv Detail & Related papers (2022-08-09T03:20:59Z) - Distilling Knowledge from Object Classification to Aesthetics Assessment [68.317720070755]
The major dilemma of image aesthetics assessment (IAA) comes from the abstract nature of aesthetic labels.
We propose to distill knowledge on semantic patterns for a vast variety of image contents to an IAA model.
By supervising an end-to-end single-backbone IAA model with the distilled knowledge, the performance of the IAA model is significantly improved.
arXiv Detail & Related papers (2022-06-02T00:39:01Z) - Confusing Image Quality Assessment: Towards Better Augmented Reality
Experience [96.29124666702566]
We consider AR technology as the superimposition of virtual scenes and real scenes, and introduce visual confusion as its basic theory.
A ConFusing Image Quality Assessment (CFIQA) database is established, which includes 600 reference images and 300 distorted images generated by mixing reference images in pairs.
An objective metric termed CFIQA is also proposed to better evaluate the confusing image quality.
arXiv Detail & Related papers (2022-04-11T07:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.