QUASAR: QUality and Aesthetics Scoring with Advanced Representations
- URL: http://arxiv.org/abs/2403.06866v3
- Date: Wed, 20 Mar 2024 08:19:52 GMT
- Title: QUASAR: QUality and Aesthetics Scoring with Advanced Representations
- Authors: Sergey Kastryulin, Denis Prokopenko, Artem Babenko, Dmitry V. Dylov,
- Abstract summary: This paper introduces a new data-driven, non-parametric method for image quality and aesthetics assessment.
We eliminate the need for expressive textual embeddings by proposing efficient image anchors in the data.
- Score: 20.194917729936357
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper introduces a new data-driven, non-parametric method for image quality and aesthetics assessment, surpassing existing approaches and requiring no prompt engineering or fine-tuning. We eliminate the need for expressive textual embeddings by proposing efficient image anchors in the data. Through extensive evaluations of 7 state-of-the-art self-supervised models, our method demonstrates superior performance and robustness across various datasets and benchmarks. Notably, it achieves high agreement with human assessments even with limited data and shows high robustness to the nature of data and their pre-processing pipeline. Our contributions offer a streamlined solution for assessment of images while providing insights into the perception of visual information.
Related papers
- KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [99.9389737339175]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - A Survey on Quality Metrics for Text-to-Image Models [9.753473063305503]
We provide an overview of existing text-to-image quality metrics addressing their nuances and the need for alignment with human preferences.
We propose a new taxonomy for categorizing these metrics, which is grounded in the assumption that there are two main quality criteria, namely compositionality and generality.
We derive guidelines for practitioners conducting text-to-image evaluation, discuss open challenges of evaluation mechanisms, and surface limitations of current metrics.
arXiv Detail & Related papers (2024-03-18T14:24:20Z) - Towards Unified Deep Image Deraining: A Survey and A New Benchmark [72.53380760079396]
We provide a comprehensive review of existing image deraining method and provide a unify evaluation setting to evaluate the performance of image deraining methods.
We construct a new high-quality benchmark named HQ-RAIN to further conduct extensive evaluation, consisting of 5,000 paired high-resolution synthetic images with higher harmony and realism.
arXiv Detail & Related papers (2023-10-05T13:35:00Z) - Fill-Up: Balancing Long-Tailed Data with Generative Models [11.91669614267993]
This paper proposes a new image synthesis pipeline for long-tailed situations using Textual Inversion.
We show that generated images from textual-inverted text tokens effectively aligns with the real domain.
We also show that real-world data imbalance scenarios can be successfully mitigated by filling up the imbalanced data with synthetic images.
arXiv Detail & Related papers (2023-06-12T16:01:20Z) - Ambiguous Images With Human Judgments for Robust Visual Event
Classification [34.62731821199598]
We create datasets of ambiguous images and use them to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos.
All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments.
We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models.
arXiv Detail & Related papers (2022-10-06T17:52:20Z) - Imposing Consistency for Optical Flow Estimation [73.53204596544472]
Imposing consistency through proxy tasks has been shown to enhance data-driven learning.
This paper introduces novel and effective consistency strategies for optical flow estimation.
arXiv Detail & Related papers (2022-04-14T22:58:30Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - No-Reference Image Quality Assessment via Feature Fusion and Multi-Task
Learning [29.19484863898778]
Blind or no-reference image quality assessment (NR-IQA) is a fundamental, unsolved, and yet challenging problem.
We propose a simple and yet effective general-purpose no-reference (NR) image quality assessment framework based on multi-task learning.
Our model employs distortion types as well as subjective human scores to predict image quality.
arXiv Detail & Related papers (2020-06-06T05:04:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.