Related papers: Exploring CNN-based models for image's aesthetic score prediction with using ensemble

Exploring CNN-based models for image's aesthetic score prediction with using ensemble

URL: http://arxiv.org/abs/2210.05119v1
Date: Tue, 11 Oct 2022 03:23:07 GMT
Title: Exploring CNN-based models for image's aesthetic score prediction with using ensemble
Authors: Ying Dai
Abstract summary: We proposed a framework of constructing two types of the automatic image aesthetics assessment models with different CNN architectures. The attention regions of the models to the images are extracted to analyze the consistency with the subjects in the images. It is found that the AS classification models trained on XiheAA dataset seem to learn the latent photography principles, although it can't be said that they learn the aesthetic sense.
Score: 3.8073142980733
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we proposed a framework of constructing two types of the automatic image aesthetics assessment models with different CNN architectures and improving the performance of the image's aesthetic score prediction by the ensemble. Moreover, the attention regions of the models to the images are extracted to analyze the consistency with the subjects in the images. The experimental results verify that the proposed method is effective for improving the AS prediction. Moreover, it is found that the AS classification models trained on XiheAA dataset seem to learn the latent photography principles, although it can't be said that they learn the aesthetic sense.

Related papers

Explaining Automatic Image Assessment [2.8084422332394428]
Our proposed approach attempts to explain aesthetic assessment models through visualizing dataset trends and automatic categorization of visual aesthetic features. By evaluating the models adapted to each specific modality using existing and novel metrics, we can capture and visualize aesthetic features and trends.
arXiv Detail & Related papers (2025-02-03T22:55:14Z)
MBInception: A new Multi-Block Inception Model for Enhancing Image Processing Efficiency [3.3748750222488657]
This article introduces an innovative image classification model that employs three consecutive inception blocks within a convolutional neural networks framework. We compare our model with well-established architectures such as Visual Geometry Group, Residual Network, and MobileNet. The outcomes reveal that our novel model consistently outperforms its counterparts across diverse datasets.
arXiv Detail & Related papers (2024-12-18T10:46:04Z)
PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference [62.72779589895124]
We make the first attempt to align diffusion models for image inpainting with human aesthetic standards via a reinforcement learning framework. We train a reward model with a dataset we construct, consisting of nearly 51,000 images annotated with human preferences. Experiments on inpainting comparison and downstream tasks, such as image extension and 3D reconstruction, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-29T11:49:39Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis [21.619269792415903]
We present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness.
arXiv Detail & Related papers (2024-03-08T07:41:47Z)
Foveation in the Era of Deep Learning [6.602118206533142]
We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images. Our model learns to iteratively attend to regions of the image relevant for classification. We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or budget.
arXiv Detail & Related papers (2023-12-03T16:48:09Z)
Image Aesthetics Assessment via Learnable Queries [59.313054821874864]
We propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach. It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder. Experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
arXiv Detail & Related papers (2023-09-06T09:42:16Z)
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining [53.470662123170555]
We propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations. Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels. Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset.
arXiv Detail & Related papers (2023-03-24T23:57:28Z)
Image Aesthetics Assessment Using Graph Attention Network [17.277954886018353]
We present a two-stage framework based on graph neural networks for image aesthetics assessment. First, we propose a feature-graph representation in which the input image is modelled as a graph, maintaining its original aspect ratio and resolution. Second, we propose a graph neural network architecture that takes this feature-graph and captures the semantic relationship between the different regions of the input image using visual attention.
arXiv Detail & Related papers (2022-06-26T12:52:46Z)
Prompt-based Learning for Unpaired Image Captioning [86.44188293709307]
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs. Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning. We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability.
arXiv Detail & Related papers (2022-05-26T03:13:43Z)
Composition and Style Attributes Guided Image Aesthetic Assessment [66.60253358722538]
We propose a method for the automatic prediction of the aesthetics of an image. The proposed network includes: a pre-trained network for semantic features extraction (the Backbone); a Multi Layer Perceptron (MLP) network that relies on the Backbone features for the prediction of image attributes (the AttributeNet) Given an image, the proposed multi-network is able to predict: style and composition attributes, and aesthetic score distribution.
arXiv Detail & Related papers (2021-11-08T17:16:38Z)
Exploring to establish an appropriate model for mage aesthetic assessment via CNN-based RSRL: An empirical study [3.8073142980733]
A D-measure which reflects the disentanglement degree of the final layer FC nodes of CNN is introduced. An algorithm of determining the optimal model from the multiple photo score prediction models is proposed.
arXiv Detail & Related papers (2021-06-07T03:20:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.