Q-Boost: On Visual Quality Assessment Ability of Low-level
Multi-Modality Foundation Models
- URL: http://arxiv.org/abs/2312.15300v1
- Date: Sat, 23 Dec 2023 17:02:25 GMT
- Title: Q-Boost: On Visual Quality Assessment Ability of Low-level
Multi-Modality Foundation Models
- Authors: Zicheng Zhang, Haoning Wu, Zhongpeng Ji, Chunyi Li, Erli Zhang, Wei
Sun, Xiaohong Liu, Xiongkuo Min, Fengyu Sun, Shangling Jui, Weisi Lin,
Guangtao Zhai
- Abstract summary: We introduce Q-Boost, a strategy designed to enhance low-level MLLMs in image quality assessment (IQA) and video quality assessment (VQA) tasks.
Q-Boost innovates by incorporating a middle ground' approach through $neutral$ prompts, allowing for a more balanced and detailed assessment.
The experimental results show that the low-level MLLMs exhibit outstanding zeros-shot performance on the IQA/VQA tasks equipped with the Q-Boost strategy.
- Score: 80.79438689784958
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in Multi-modality Large Language Models (MLLMs) have
demonstrated remarkable capabilities in complex high-level vision tasks.
However, the exploration of MLLM potential in visual quality assessment, a
vital aspect of low-level vision, remains limited. To address this gap, we
introduce Q-Boost, a novel strategy designed to enhance low-level MLLMs in
image quality assessment (IQA) and video quality assessment (VQA) tasks, which
is structured around two pivotal components: 1) Triadic-Tone Integration:
Ordinary prompt design simply oscillates between the binary extremes of
$positive$ and $negative$. Q-Boost innovates by incorporating a `middle ground'
approach through $neutral$ prompts, allowing for a more balanced and detailed
assessment. 2) Multi-Prompt Ensemble: Multiple quality-centric prompts are used
to mitigate bias and acquire more accurate evaluation. The experimental results
show that the low-level MLLMs exhibit outstanding zeros-shot performance on the
IQA/VQA tasks equipped with the Q-Boost strategy.
Related papers
- VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
Video Quality Assessment (VQA) is a classic field in low-level visual perception.
Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can enhance markedly low-level visual quality evaluation.
We introduce the VQA2 Instruction dataset - the first visual question answering instruction dataset that focuses on video quality assessment.
The VQA2 series models interleave visual and motion tokens to enhance the perception of spatial-temporal quality details in videos.
arXiv Detail & Related papers (2024-11-06T09:39:52Z) - LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models [53.64461404882853]
Video quality assessment (VQA) algorithms are needed to monitor and optimize the quality of streaming videos.
Here, we propose the first Large Multi-Modal Video Quality Assessment (LMM-VQA) model, which introduces a novel visual modeling strategy for quality-aware feature extraction.
arXiv Detail & Related papers (2024-08-26T04:29:52Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - 2AFC Prompting of Large Multimodal Models for Image Quality Assessment [38.86162365208038]
Two-alternative forced choice (2AFC) prompting is widely regarded as the most reliable way of collecting human opinions of visual quality.
Global quality score of each image estimated by a particular LMM can be efficiently aggregated using the maximum a posterior estimation.
arXiv Detail & Related papers (2024-02-02T06:05:18Z) - Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level
Vision [85.6008224440157]
Multi-modality Large Language Models (MLLMs) have catalyzed a shift in computer vision from specialized models to general-purpose foundation models.
We present Q-Bench, a holistic benchmark crafted to evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment.
arXiv Detail & Related papers (2023-09-25T14:43:43Z) - Blind Multimodal Quality Assessment: A Brief Survey and A Case Study of
Low-light Images [73.27643795557778]
Blind image quality assessment (BIQA) aims at automatically and accurately forecasting objective scores for visual signals.
Recent developments in this field are dominated by unimodal solutions inconsistent with human subjective rating patterns.
We present a unique blind multimodal quality assessment (BMQA) of low-light images from subjective evaluation to objective score.
arXiv Detail & Related papers (2023-03-18T09:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.