M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment
- URL: http://arxiv.org/abs/2502.15167v2
- Date: Wed, 04 Jun 2025 08:30:11 GMT
- Title: M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment
- Authors: Chuan Cui, Kejiang Chen, Zhihua Wei, Wen Shen, Weiming Zhang, Nenghai Yu,
- Abstract summary: M3-AGIQA is a comprehensive framework that enables more human-aligned, holistic evaluation of AI-generated images.<n>By aligning model outputs more closely with human judgment, M3-AGIQA delivers robust and interpretable quality scores.
- Score: 65.3860007085689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of AI-generated image (AIGI) models presents new challenges for evaluating image quality, particularly across three aspects: perceptual quality, prompt correspondence, and authenticity. To address these challenges, we introduce M3-AGIQA, a comprehensive framework that leverages Multimodal Large Language Models (MLLMs) to enable more human-aligned, holistic evaluation of AI-generated images across both visual and textual domains. Besides, our framework features a structured multi-round evaluation process, generating and analyzing intermediate image descriptions to provide deeper insight into these three aspects. By aligning model outputs more closely with human judgment, M3-AGIQA delivers robust and interpretable quality scores. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art performance on tested datasets and aspects, and exhibits strong generalizability in most cross-dataset settings. Code is available at https://github.com/strawhatboy/M3-AGIQA.
Related papers
- Text-Visual Semantic Constrained AI-Generated Image Quality Assessment [47.575342788480505]
We propose a unified framework to enhance the comprehensive evaluation of both text-image consistency and perceptual distortion in AI-generated images.<n>Our approach integrates key capabilities from multiple models and tackles the aforementioned challenges by introducing two core modules.<n>Tests conducted on multiple benchmark datasets demonstrate that SC-AGIQA outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-07-14T16:21:05Z) - Towards Explainable Partial-AIGC Image Quality Assessment [51.42831861127991]
Despite extensive research on image quality assessment (IQA) for AI-generated images (AGIs), most studies focus on fully AI-generated outputs.<n>We construct the first large-scale PAI dataset towards explainable partial-AIGC image quality assessment (EPAIQA)<n>Our work represents a pioneering effort in the perceptual IQA field for comprehensive PAI quality assessment.
arXiv Detail & Related papers (2025-04-12T17:27:50Z) - CoLLM: A Large Language Model for Composed Image Retrieval [76.29725148964368]
Composed Image Retrieval (CIR) is a complex task that aims to retrieve images based on a multimodal query.
We present CoLLM, a one-stop framework that generates triplets on-the-fly from image-caption pairs.
We leverage Large Language Models (LLMs) to generate joint embeddings of reference images and modification texts.
arXiv Detail & Related papers (2025-03-25T17:59:50Z) - VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning [63.0285363282581]
Multimodal Large Language Models (MLLMs) have become a powerful tool for integrating visual and textual information.
We introduce VOILA, a benchmark designed to evaluate MLLMs' perceptual understanding and abstract relational reasoning.
We reveal that current MLLMs struggle to comprehend inter-image relationships and exhibit limited capabilities in high-level relational reasoning.
arXiv Detail & Related papers (2025-02-25T23:36:19Z) - MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.39859547619156]
We propose MMEvol, a novel multimodal instruction data evolution framework.<n>MMEvol iteratively improves data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution.<n>Our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-09T17:44:00Z) - Detecting the Undetectable: Combining Kolmogorov-Arnold Networks and MLP for AI-Generated Image Detection [0.0]
This paper presents a novel detection framework adept at robustly identifying images produced by cutting-edge generative AI models.
We propose a classification system that integrates semantic image embeddings with a traditional Multilayer Perceptron (MLP)
arXiv Detail & Related papers (2024-08-18T06:00:36Z) - Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment [46.55045595936298]
Multimodal Large Language Models (MLLMs) have experienced significant advancement in visual understanding and reasoning.
Their potential to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored.
arXiv Detail & Related papers (2024-03-16T08:30:45Z) - Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models [28.194638379354252]
We introduce a Depicted image Quality Assessment method (DepictQA), overcoming the constraints of traditional score-based methods.
DepictQA allows for detailed, language-based, human-like evaluation of image quality by leveraging Multi-modal Large Language Models.
These results showcase the research potential of multi-modal IQA methods.
arXiv Detail & Related papers (2023-12-14T14:10:02Z) - Assessor360: Multi-sequence Network for Blind Omnidirectional Image
Quality Assessment [50.82681686110528]
Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs)
The quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process.
We propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure.
arXiv Detail & Related papers (2023-05-18T13:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.