AesTest: Measuring Aesthetic Intelligence from Perception to Production
- URL: http://arxiv.org/abs/2511.06360v1
- Date: Sun, 09 Nov 2025 12:44:10 GMT
- Title: AesTest: Measuring Aesthetic Intelligence from Perception to Production
- Authors: Guolong Wang, Heng Huang, Zhiqiang Zhang, Wentian Li, Feilong Ma, Xin Jin,
- Abstract summary: AesTest is a benchmark for multimodal aesthetic perception and production.<n>It consists of curated multiple-choice questions spanning ten tasks, covering perception, appreciation, creation, and photography.<n>It integrates data from diverse sources, including professional editing, photographic composition tutorials, and crowdsourced preferences.<n>We evaluate both instruction-tuned IAA MLLMs and general MLLMs on AesTest, revealing significant challenges in building aesthetic intelligence.
- Score: 48.70942686586114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perceiving and producing aesthetic judgments is a fundamental yet underexplored capability for multimodal large language models (MLLMs). However, existing benchmarks for image aesthetic assessment (IAA) are narrow in perception scope or lack the diversity needed to evaluate systematic aesthetic production. To address this gap, we introduce AesTest, a comprehensive benchmark for multimodal aesthetic perception and production, distinguished by the following features: 1) It consists of curated multiple-choice questions spanning ten tasks, covering perception, appreciation, creation, and photography. These tasks are grounded in psychological theories of generative learning. 2) It integrates data from diverse sources, including professional editing workflows, photographic composition tutorials, and crowdsourced preferences. It ensures coverage of both expert-level principles and real-world variation. 3) It supports various aesthetic query types, such as attribute-based analysis, emotional resonance, compositional choice, and stylistic reasoning. We evaluate both instruction-tuned IAA MLLMs and general MLLMs on AesTest, revealing significant challenges in building aesthetic intelligence. We will publicly release AesTest to support future research in this area.
Related papers
- Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis [38.98188484491387]
PICK is a multi-step framework designed for Psychoanalytical Image through hierarchical analysis and knowledge injection.<n>It focuses on the House-Tree-Person (HTP) Test, a widely used psychological assessment in clinical practice.<n>Our approach bridges the gap between MLLMs and specialized expert domains, offering a structured and interpretable framework for understanding human mental states through visual expression.
arXiv Detail & Related papers (2025-10-22T10:29:14Z) - Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning [14.405750888492735]
Image Aesthetic Assessment (IAA) is a vital and intricate task that entails analyzing and assessing an image's aesthetic values.<n>Traditional methods of IAA often concentrate on a single aesthetic task and suffer from inadequate labeled datasets.<n>We propose a comprehensive aesthetic MLLM capable of nuanced aesthetic insight.
arXiv Detail & Related papers (2024-12-16T16:35:35Z) - I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing [67.05794909694649]
We propose I2EBench, a comprehensive benchmark to evaluate the quality of edited images produced by IIE models.
I2EBench consists of 2,000+ images for editing, along with 4,000+ corresponding original and diverse instructions.
We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models.
arXiv Detail & Related papers (2024-08-26T11:08:44Z) - AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception [74.11069437400398]
We develop a corpus-rich aesthetic critique database with 21,904 diverse-sourced images and 88K human natural language feedbacks.
We fine-tune the open-sourced general foundation models, achieving multi-modality Aesthetic Expert models, dubbed AesExpert.
Experiments demonstrate that the proposed AesExpert models deliver significantly better aesthetic perception performances than the state-of-the-art MLLMs.
arXiv Detail & Related papers (2024-04-15T09:56:20Z) - AesBench: An Expert Benchmark for Multimodal Large Language Models on
Image Aesthetics Perception [64.25808552299905]
AesBench is an expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs.
We construct an Expert-labeled Aesthetics Perception Database (EAPD), which features diversified image contents and high-quality annotations provided by professional aesthetic experts.
We propose a set of integrative criteria to measure the aesthetic perception abilities of MLLMs from four perspectives, including Perception (AesP), Empathy (AesE), Assessment (AesA) and Interpretation (AesI)
arXiv Detail & Related papers (2024-01-16T10:58:07Z) - Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined
Levels [95.44077384918725]
We propose to teach large multi-modality models (LMMs) with text-defined rating levels instead of scores.
The proposed Q-Align achieves state-of-the-art performance on image quality assessment (IQA), image aesthetic assessment (IAA) and video quality assessment (VQA) tasks.
arXiv Detail & Related papers (2023-12-28T16:10:25Z) - Unveiling The Factors of Aesthetic Preferences with Explainable AI [0.0]
In this study, we pioneer a novel perspective by utilizing several different machine learning (ML) models.
Our models process these attributes as inputs to predict the aesthetic scores of images.
Our aim is to shed light on the complex nature of aesthetic preferences in images through ML and to provide a deeper understanding of the attributes that influence aesthetic judgements.
arXiv Detail & Related papers (2023-11-24T11:06:22Z) - Image Aesthetics Assessment via Learnable Queries [59.313054821874864]
We propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach.
It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder.
Experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
arXiv Detail & Related papers (2023-09-06T09:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.