Related papers: Multi-modal Learnable Queries for Image Aesthetics Assessment

Multi-modal Learnable Queries for Image Aesthetics Assessment

URL: http://arxiv.org/abs/2405.01326v1
Date: Thu, 2 May 2024 14:31:47 GMT
Title: Multi-modal Learnable Queries for Image Aesthetics Assessment
Authors: Zhiwei Xiong, Yunfan Zhang, Zhiqi Shen, Peiran Ren, Han Yu,
Abstract summary: We propose MMLQ, which utilizes multi-modal learnable queries to extract aesthetics-related features from multi-modal pre-trained features. MMLQ achieves new state-of-the-art performance on multi-modal IAA, beating previous methods by 7.7% and 8.3% in terms of SRCC and PLCC, respectively.
Score: 55.28571422062623
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image aesthetics assessment (IAA) is attracting wide interest with the prevalence of social media. The problem is challenging due to its subjective and ambiguous nature. Instead of directly extracting aesthetic features solely from the image, user comments associated with an image could potentially provide complementary knowledge that is useful for IAA. With existing large-scale pre-trained models demonstrating strong capabilities in extracting high-quality transferable visual and textual features, learnable queries are shown to be effective in extracting useful features from the pre-trained visual features. Therefore, in this paper, we propose MMLQ, which utilizes multi-modal learnable queries to extract aesthetics-related features from multi-modal pre-trained features. Extensive experimental results demonstrate that MMLQ achieves new state-of-the-art performance on multi-modal IAA, beating previous methods by 7.7% and 8.3% in terms of SRCC and PLCC, respectively.

Related papers

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning [27.26829134776367]
Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation. We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO) We show that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks.
arXiv Detail & Related papers (2025-03-28T17:59:54Z)
M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment [65.3860007085689]
M3-AGIQA is a comprehensive framework for AGI quality assessment. It includes a structured multi-round evaluation mechanism, where intermediate image descriptions are generated. Experiments conducted on multiple benchmark datasets demonstrate that M3-AGIQA achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-02-21T03:05:45Z)
Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning [14.405750888492735]
Image Aesthetic Assessment (IAA) is a vital and intricate task that entails analyzing and assessing an image's aesthetic values. Traditional methods of IAA often concentrate on a single aesthetic task and suffer from inadequate labeled datasets. We propose a comprehensive aesthetic MLLM capable of nuanced aesthetic insight.
arXiv Detail & Related papers (2024-12-16T16:35:35Z)
UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment [23.48816491333345]
Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal. Existing methods typically address these tasks independently due to distinct learning objectives. We propose Unified vision-language pre-training of Quality and Aesthetics (UniQA) to learn general perceptions of two tasks, thereby benefiting them simultaneously.
arXiv Detail & Related papers (2024-06-03T07:40:10Z)
Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension. First, the model self-constructs a preference for image descriptions using unlabeled images. To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z)
Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA) Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
Unveiling The Factors of Aesthetic Preferences with Explainable AI [0.0]
In this study, we pioneer a novel perspective by utilizing several different machine learning (ML) models. Our models process these attributes as inputs to predict the aesthetic scores of images. Our aim is to shed light on the complex nature of aesthetic preferences in images through ML and to provide a deeper understanding of the attributes that influence aesthetic judgements.
arXiv Detail & Related papers (2023-11-24T11:06:22Z)
Image Aesthetics Assessment via Learnable Queries [59.313054821874864]
We propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach. It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder. Experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
arXiv Detail & Related papers (2023-09-06T09:42:16Z)
Distilling Knowledge from Object Classification to Aesthetics Assessment [68.317720070755]
The major dilemma of image aesthetics assessment (IAA) comes from the abstract nature of aesthetic labels. We propose to distill knowledge on semantic patterns for a vast variety of image contents to an IAA model. By supervising an end-to-end single-backbone IAA model with the distilled knowledge, the performance of the IAA model is significantly improved.
arXiv Detail & Related papers (2022-06-02T00:39:01Z)
Training and challenging models for text-guided fashion image retrieval [1.4266272677701561]
We introduce a new evaluation dataset, Challenging Fashion Queries (CFQ) CFQ complements existing benchmarks by including relative captions with positive and negative labels of caption accuracy and conditional image similarity. We demonstrate the importance of multimodal pretraining for the task and show that domain-specific weak supervision based on attribute labels can augment generic large-scale pretraining.
arXiv Detail & Related papers (2022-04-23T06:24:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.