ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding
- URL: http://arxiv.org/abs/2507.14533v2
- Date: Mon, 11 Aug 2025 03:10:24 GMT
- Title: ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding
- Authors: Shuo Cao, Nan Ma, Jiayang Li, Xiaohui Li, Lihao Shao, Kaiwen Zhu, Yu Zhou, Yuandong Pu, Jiarui Wu, Jiaquan Wang, Bo Qu, Wenhai Wang, Yu Qiao, Dajuin Yao, Yihao Liu,
- Abstract summary: ArtiMuse is an innovative MLLM-based IAA model with Joint Scoring and Expert-Level Understanding capabilities.<n>ArtiMuse-10K is the first expert-curated image aesthetic dataset comprising 10,000 images spanning 5 main categories and 15 subcategories.
- Score: 32.55711618391249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of educational applications, artistic creation, and AI-generated content (AIGC) technologies has substantially increased practical requirements for comprehensive Image Aesthetics Assessment (IAA), particularly demanding methods capable of delivering both quantitative scoring and professional understanding. Multimodal Large Language Model (MLLM)-based IAA methods demonstrate stronger perceptual and generalization capabilities compared to traditional approaches, yet they suffer from modality bias (score-only or text-only) and lack fine-grained attribute decomposition, thereby failing to support further aesthetic assessment. In this paper, we present:(1) ArtiMuse, an innovative MLLM-based IAA model with Joint Scoring and Expert-Level Understanding capabilities; (2) ArtiMuse-10K, the first expert-curated image aesthetic dataset comprising 10,000 images spanning 5 main categories and 15 subcategories, each annotated by professional experts with 8-dimensional attributes analysis and a holistic score. Both the model and dataset will be made public to advance the field.
Related papers
- Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment [51.40989269202702]
aesthetic quality assessment task is crucial for developing a human-aligned quantitative evaluation system for AIGC.<n>We propose ArtQuant, an aesthetics assessment framework for artistic images which couples isolated aesthetic dimensions through description generation.<n>Our approach achieves epoch state-of-the-art performance on several datasets while requiring only 33% of conventional trainings.
arXiv Detail & Related papers (2025-12-29T12:18:26Z) - KidsArtBench: Multi-Dimensional Children's Art Evaluation with Attribute-Aware MLLMs [13.1845557800464]
We introduce KidsArtBench, a new benchmark of over 1k children's artworks (ages 5-15) annotated by 12 expert educators across 9 rubric-aligned dimensions.<n>KidsArtBench targets children's artwork and pairs multi-dimensional annotations with comment supervision to enable both ordinal assessment and formative feedback.
arXiv Detail & Related papers (2025-12-14T00:24:48Z) - MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis [1.8230765666532822]
MIRAGE is a novel FM for the analysis of OCT and scanning laser ophthalmoscopy (SLO) images.<n>We propose a new evaluation benchmark with OCT/SLO classification and segmentation tasks.<n>The comparison with general and specialized FMs and segmentation methods shows the superiority of MIRAGE in both types of tasks.
arXiv Detail & Related papers (2025-06-10T15:25:55Z) - Towards Explainable Partial-AIGC Image Quality Assessment [51.42831861127991]
Despite extensive research on image quality assessment (IQA) for AI-generated images (AGIs), most studies focus on fully AI-generated outputs.<n>We construct the first large-scale PAI dataset towards explainable partial-AIGC image quality assessment (EPAIQA)<n>Our work represents a pioneering effort in the perceptual IQA field for comprehensive PAI quality assessment.
arXiv Detail & Related papers (2025-04-12T17:27:50Z) - FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics [66.14786900470158]
We propose FakeScope, an expert multimodal model (LMM) tailored for AI-generated image forensics.<n>FakeScope identifies AI-synthetic images with high accuracy and provides rich, interpretable, and query-driven forensic insights.<n>FakeScope achieves state-of-the-art performance in both closed-ended and open-ended forensic scenarios.
arXiv Detail & Related papers (2025-03-31T16:12:48Z) - HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment [11.253286640424811]
HumanBeauty is the first dataset purpose-built for Human Image Aesthetic Assessment (HIAA)<n>HumanAesExpert is a powerful Vision Language Model for aesthetic evaluation of human images.
arXiv Detail & Related papers (2025-03-31T09:58:11Z) - Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model [19.2881640541533]
Large language models (MLLMs) have shown great potential in image quality assessment (IQA) and image aesthetic assessment (IAA)<n>Here, we introduce a novel dataset, named Realistic image Quality and Aesthetic (RealQA)<n>These attributes span three levels: low level (e.g., image clarity), middle level (e.g., subject integrity) and high level (e.g. composition)<n>Surprisingly, by predicting just two extra significant digits, the next token paradigm can achieve SOTA performance.
arXiv Detail & Related papers (2025-03-08T09:49:10Z) - AI-generated Image Quality Assessment in Visual Communication [72.11144790293086]
AIGI-VC is a quality assessment database for AI-generated images in visual communication.<n>The dataset consists of 2,500 images spanning 14 advertisement topics and 8 emotion types.<n>It provides coarse-grained human preference annotations and fine-grained preference descriptions, benchmarking the abilities of IQA methods in preference prediction, interpretation, and reasoning.
arXiv Detail & Related papers (2024-12-20T08:47:07Z) - Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning [14.405750888492735]
Image Aesthetic Assessment (IAA) is a vital and intricate task that entails analyzing and assessing an image's aesthetic values.<n>Traditional methods of IAA often concentrate on a single aesthetic task and suffer from inadequate labeled datasets.<n>We propose a comprehensive aesthetic MLLM capable of nuanced aesthetic insight.
arXiv Detail & Related papers (2024-12-16T16:35:35Z) - AiSciVision: A Framework for Specializing Large Multimodal Models in Scientific Image Classification [2.4515373478215343]
We introduce AiSciVision, a framework that specializes Large Multimodal Models (LMMs) into interactive research partners.
Our framework uses two key components: Visual Retrieval-Augmented Generation (VisRAG) and domain-specific tools utilized in an agentic workflow.
We evaluate AiSciVision on three real-world scientific image classification datasets: detecting the presence of aquaculture ponds, eelgrass, and solar panels.
arXiv Detail & Related papers (2024-10-28T19:35:47Z) - A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends [67.43992456058541]
Image restoration (IR) aims to recover high-quality images from inputs degraded by various factors such as noise, blur, compression, and adverse weather.<n>Traditional IR methods typically focus on specific types of degradation, which limits their effectiveness in real-world scenarios with complex distortions.<n>The all-in-one image restoration paradigm has recently emerged, offering a unified framework that adeptly addresses multiple degradation types.
arXiv Detail & Related papers (2024-10-19T11:11:09Z) - Multi-modal Learnable Queries for Image Aesthetics Assessment [55.28571422062623]
We propose MMLQ, which utilizes multi-modal learnable queries to extract aesthetics-related features from multi-modal pre-trained features.
MMLQ achieves new state-of-the-art performance on multi-modal IAA, beating previous methods by 7.7% and 8.3% in terms of SRCC and PLCC, respectively.
arXiv Detail & Related papers (2024-05-02T14:31:47Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and
a New Method [64.40494830113286]
We first introduce a large-scale AIAA dataset: Boldbrush Artistic Image dataset (BAID), which consists of 60,337 artistic images covering various art forms.
We then propose a new method, SAAN, which can effectively extract and utilize style-specific and generic aesthetic information to evaluate artistic images.
Experiments demonstrate that our proposed approach outperforms existing IAA methods on the proposed BAID dataset.
arXiv Detail & Related papers (2023-03-27T12:59:15Z) - A Perceptual Quality Assessment Exploration for AIGC Images [39.72512063793346]
In this paper, we discuss the major evaluation aspects such as technical issues, AI artifacts, unnaturalness, discrepancy, and aesthetics for AGI quality assessment.
We present the first perceptual AGI quality assessment database, AGIQA-1K, which consists of 1,080 AGIs generated from diffusion models.
arXiv Detail & Related papers (2023-03-22T14:59:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.