IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models
- URL: http://arxiv.org/abs/2312.15663v1
- Date: Mon, 25 Dec 2023 09:13:18 GMT
- Title: IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models
- Authors: Zhihao Chen, Bin Hu, Chuang Niu, Tao Chen, Yuxin Li, Hongming Shan, Ge
Wang
- Abstract summary: This paper introduces IQAGPT, an innovative image quality assessment system integrating an image quality captioning VLM with ChatGPT.
We build a CT-IQA dataset for training and evaluation, comprising 1,000 CT slices with diverse quality levels professionally annotated.
To better leverage the capabilities of LLMs, we convert annotated quality scores into semantically rich text descriptions using a prompt template.
- Score: 23.99102775778499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs), such as ChatGPT, have demonstrated impressive
capabilities in various tasks and attracted an increasing interest as a natural
language interface across many domains. Recently, large vision-language models
(VLMs) like BLIP-2 and GPT-4 have been intensively investigated, which learn
rich vision-language correlation from image-text pairs. However, despite these
developments, the application of LLMs and VLMs in image quality assessment
(IQA), particularly in medical imaging, remains to be explored, which is
valuable for objective performance evaluation and potential supplement or even
replacement of radiologists' opinions. To this end, this paper introduces
IQAGPT, an innovative image quality assessment system integrating an image
quality captioning VLM with ChatGPT for generating quality scores and textual
reports. First, we build a CT-IQA dataset for training and evaluation,
comprising 1,000 CT slices with diverse quality levels professionally
annotated. To better leverage the capabilities of LLMs, we convert annotated
quality scores into semantically rich text descriptions using a prompt
template. Second, we fine-tune the image quality captioning VLM on the CT-IQA
dataset to generate quality descriptions. The captioning model fuses the image
and text features through cross-modal attention. Third, based on the quality
descriptions, users can talk with ChatGPT to rate image quality scores or
produce a radiological quality report. Our preliminary results demonstrate the
feasibility of assessing image quality with large models. Remarkably, our
IQAGPT outperforms GPT-4 and CLIP-IQA, as well as the multi-task classification
and regression models that solely rely on images.
Related papers
- Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment [57.07360640784803]
We propose vision-language consistency guided multi-modal prompt learning for blind image quality assessment (AGIQA)
Specifically, we introduce learnable textual and visual prompts in language and vision branches of Contrastive Language-Image Pre-training (CLIP) models.
We design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts.
arXiv Detail & Related papers (2024-06-24T13:45:31Z) - Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment [20.851102845794244]
Distilling high level knowledge about quality bearing attributes is crucial for developing objective Image Quality Assessment (IQA)
We present a new blind IQA (BIQA) model termed Self-supervision and Vision-Language supervision Image QUality Evaluator (SLIQUE)
SLIQUE features a joint vision-language and visual contrastive representation learning framework for acquiring high level knowledge about the images semantic contents, distortion characteristics and appearance properties for IQA.
arXiv Detail & Related papers (2024-06-14T09:18:28Z) - Descriptive Image Quality Assessment in the Wild [25.503311093471076]
VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression.
We introduce Depicted image Quality Assessment in the Wild (DepictQA-Wild)
Our method includes a multi-functional IQA task paradigm that encompasses both assessment and comparison tasks, brief and detailed responses, full-reference and non-reference scenarios.
arXiv Detail & Related papers (2024-05-29T07:49:15Z) - Dual-Branch Network for Portrait Image Quality Assessment [76.27716058987251]
We introduce a dual-branch network for portrait image quality assessment (PIQA)
We utilize two backbone networks (textiti.e., Swin Transformer-B) to extract the quality-aware features from the entire portrait image and the facial image cropped from it.
We leverage LIQE, an image scene classification and quality assessment model, to capture the quality-aware and scene-specific features as the auxiliary features.
arXiv Detail & Related papers (2024-05-14T12:43:43Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction [66.98008357232428]
We propose FineMatch, a new aspect-based fine-grained text and image matching benchmark.
FineMatch focuses on text and image mismatch detection and correction.
We show that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches.
arXiv Detail & Related papers (2024-04-23T03:42:14Z) - Holistic Evaluation of Text-To-Image Models [153.47415461488097]
We introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM)
We identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency.
Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths.
arXiv Detail & Related papers (2023-11-07T19:00:56Z) - X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation
with Visual Large Language Models [17.67105465600566]
This paper introduces a novel explainable image quality evaluation approach called X-IQE.
X-IQE uses visual large language models (LLMs) to evaluate text-to-image generation methods by generating textual explanations.
It offers several advantages, including the ability to distinguish between real and generated images, evaluate text-image alignment, and assess image aesthetics without requiring model training or fine-tuning.
arXiv Detail & Related papers (2023-05-18T09:56:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.