Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks
- URL: http://arxiv.org/abs/2603.03907v1
- Date: Wed, 04 Mar 2026 10:13:27 GMT
- Title: Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks
- Authors: Zhichao Yang, Jianjie Wang, Zhixianhe Zhang, Pangu Xie, Xiangfei Sheng, Pengfei Chen, Leida Li,
- Abstract summary: Image aesthetic assessment (IAA) has extensive applications in content creation, album management, and recommendation systems.<n>State-of-the-art IAA models are typically designed for coarse-grained evaluation.<n>We propose FGAesQ, a novel IAA framework that learns discriminative aesthetic scores from relative ranks.
- Score: 26.53088863857899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image aesthetic assessment (IAA) has extensive applications in content creation, album management, and recommendation systems, etc. In such applications, it is commonly needed to pick out the most aesthetically pleasing image from a series of images with subtle aesthetic variations, a topic we refer to as fine-grained IAA. Unfortunately, state-of-the-art IAA models are typically designed for coarse-grained evaluation, where images with notable aesthetic differences are evaluated independently on an absolute scale. These models are inherently limited in discriminating fine-grained aesthetic differences. To address the dilemma, we contribute FGAesthetics, a fine-grained IAA database with 32,217 images organized into 10,028 series, which are sourced from diverse categories including Natural, AIGC, and Cropping. Annotations are collected via pairwise comparisons within each series. We also devise Series Refinement and Rank Calibration to ensure the reliability of data and labels. Based on FGAesthetics, we further propose FGAesQ, a novel IAA framework that learns discriminative aesthetic scores from relative ranks through Difference-preserved Tokenization (DiffToken), Comparative Text-assisted Alignment (CTAlign), and Rank-aware Regression (RankReg). FGAesQ enables accurate aesthetic assessment in fine-grained scenarios while still maintains competitive performance in coarse-grained evaluation. Extensive experiments and comparisons demonstrate the superiority of the proposed method.
Related papers
- On the Role of Individual Differences in Current Approaches to Computational Image Aesthetics [38.85583529536269]
Image assessment (IAA) evaluates image aesthetics, a task complicated by image diversity and user subjectivity.<n>Current approaches address this in two stages: Generic IAA (GIAA) models estimate mean aesthetic scores, while Personal IAA (PIAA) models adapt GIAA using transfer learning to incorporate user subjectivity.<n>This work establishes a theoretical foundation for IAA, proposing a unified model that encodes individual characteristics in a distributional format for both individual and group assessments.
arXiv Detail & Related papers (2025-02-27T21:01:19Z) - Image Aesthetics Assessment via Learnable Queries [59.313054821874864]
We propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach.
It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder.
Experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
arXiv Detail & Related papers (2023-09-06T09:42:16Z) - Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and
a New Method [64.40494830113286]
We first introduce a large-scale AIAA dataset: Boldbrush Artistic Image dataset (BAID), which consists of 60,337 artistic images covering various art forms.
We then propose a new method, SAAN, which can effectively extract and utilize style-specific and generic aesthetic information to evaluate artistic images.
Experiments demonstrate that our proposed approach outperforms existing IAA methods on the proposed BAID dataset.
arXiv Detail & Related papers (2023-03-27T12:59:15Z) - VILA: Learning Image Aesthetics from User Comments with Vision-Language
Pretraining [53.470662123170555]
We propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations.
Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels.
Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset.
arXiv Detail & Related papers (2023-03-24T23:57:28Z) - Distilling Knowledge from Object Classification to Aesthetics Assessment [68.317720070755]
The major dilemma of image aesthetics assessment (IAA) comes from the abstract nature of aesthetic labels.
We propose to distill knowledge on semantic patterns for a vast variety of image contents to an IAA model.
By supervising an end-to-end single-backbone IAA model with the distilled knowledge, the performance of the IAA model is significantly improved.
arXiv Detail & Related papers (2022-06-02T00:39:01Z) - A Compositional Feature Embedding and Similarity Metric for
Ultra-Fine-Grained Visual Categorization [16.843126268445726]
Fine-grained visual categorization (FGVC) aims at classifying objects with small inter-class variances.
This paper proposes a novel compositional feature embedding and similarity metric ( CECS) for ultra-fine-grained visual categorization.
Experimental results on two ultra-FGVC datasets and one FGVC dataset with recent benchmark methods consistently demonstrate that the proposed CECS method achieves the state-the-art performance.
arXiv Detail & Related papers (2021-09-25T15:05:25Z) - Learning Conditional Knowledge Distillation for Degraded-Reference Image
Quality Assessment [157.1292674649519]
We propose a practical solution named degraded-reference IQA (DR-IQA)
DR-IQA exploits the inputs of IR models, degraded images, as references.
Our results can even be close to the performance of full-reference settings.
arXiv Detail & Related papers (2021-08-18T02:35:08Z) - I Am Going MAD: Maximum Discrepancy Competition for Comparing
Classifiers Adaptively [135.7695909882746]
We name the MAximum Discrepancy (MAD) competition.
We adaptively sample a small test set from an arbitrarily large corpus of unlabeled images.
Human labeling on the resulting model-dependent image sets reveals the relative performance of the competing classifiers.
arXiv Detail & Related papers (2020-02-25T03:32:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.