Exploring CNN-based models for image's aesthetic score prediction with
using ensemble
- URL: http://arxiv.org/abs/2210.05119v1
- Date: Tue, 11 Oct 2022 03:23:07 GMT
- Title: Exploring CNN-based models for image's aesthetic score prediction with
using ensemble
- Authors: Ying Dai
- Abstract summary: We proposed a framework of constructing two types of the automatic image aesthetics assessment models with different CNN architectures.
The attention regions of the models to the images are extracted to analyze the consistency with the subjects in the images.
It is found that the AS classification models trained on XiheAA dataset seem to learn the latent photography principles, although it can't be said that they learn the aesthetic sense.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we proposed a framework of constructing two types of the
automatic image aesthetics assessment models with different CNN architectures
and improving the performance of the image's aesthetic score prediction by the
ensemble. Moreover, the attention regions of the models to the images are
extracted to analyze the consistency with the subjects in the images. The
experimental results verify that the proposed method is effective for improving
the AS prediction. Moreover, it is found that the AS classification models
trained on XiheAA dataset seem to learn the latent photography principles,
although it can't be said that they learn the aesthetic sense.
Related papers
- Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Foveation in the Era of Deep Learning [6.602118206533142]
We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images.
Our model learns to iteratively attend to regions of the image relevant for classification.
We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or budget.
arXiv Detail & Related papers (2023-12-03T16:48:09Z) - Evaluating the Reliability of CNN Models on Classifying Traffic and Road
Signs using LIME [1.188383832081829]
The study focuses on evaluating the accuracy of these models' predictions as well as their ability to employ appropriate features for image categorization.
To gain insights into the strengths and limitations of the model's predictions, the study employs the local interpretable model-agnostic explanations (LIME) framework.
arXiv Detail & Related papers (2023-09-11T18:11:38Z) - Image Aesthetics Assessment via Learnable Queries [59.313054821874864]
We propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach.
It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder.
Experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
arXiv Detail & Related papers (2023-09-06T09:42:16Z) - VILA: Learning Image Aesthetics from User Comments with Vision-Language
Pretraining [53.470662123170555]
We propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations.
Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels.
Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset.
arXiv Detail & Related papers (2023-03-24T23:57:28Z) - Image Aesthetics Assessment Using Graph Attention Network [17.277954886018353]
We present a two-stage framework based on graph neural networks for image aesthetics assessment.
First, we propose a feature-graph representation in which the input image is modelled as a graph, maintaining its original aspect ratio and resolution.
Second, we propose a graph neural network architecture that takes this feature-graph and captures the semantic relationship between the different regions of the input image using visual attention.
arXiv Detail & Related papers (2022-06-26T12:52:46Z) - Prompt-based Learning for Unpaired Image Captioning [86.44188293709307]
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs.
Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning.
We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability.
arXiv Detail & Related papers (2022-05-26T03:13:43Z) - Composition and Style Attributes Guided Image Aesthetic Assessment [66.60253358722538]
We propose a method for the automatic prediction of the aesthetics of an image.
The proposed network includes: a pre-trained network for semantic features extraction (the Backbone); a Multi Layer Perceptron (MLP) network that relies on the Backbone features for the prediction of image attributes (the AttributeNet)
Given an image, the proposed multi-network is able to predict: style and composition attributes, and aesthetic score distribution.
arXiv Detail & Related papers (2021-11-08T17:16:38Z) - Exploring to establish an appropriate model for mage aesthetic
assessment via CNN-based RSRL: An empirical study [3.8073142980733]
A D-measure which reflects the disentanglement degree of the final layer FC nodes of CNN is introduced.
An algorithm of determining the optimal model from the multiple photo score prediction models is proposed.
arXiv Detail & Related papers (2021-06-07T03:20:00Z) - A Deep Drift-Diffusion Model for Image Aesthetic Score Distribution
Prediction [68.76594695163386]
We propose a Deep Drift-Diffusion model inspired by psychologists to predict aesthetic score distribution from images.
The DDD model can describe the psychological process of aesthetic perception instead of traditional modeling of the results of assessment.
Our novel DDD model is simple but efficient, which outperforms the state-of-the-art methods in aesthetic score distribution prediction.
arXiv Detail & Related papers (2020-10-15T11:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.