Related papers: Gamma: Toward Generic Image Assessment with Mixture of Assessment Experts

Gamma: Toward Generic Image Assessment with Mixture of Assessment Experts

URL: http://arxiv.org/abs/2503.06678v1
Date: Sun, 09 Mar 2025 16:07:58 GMT
Title: Gamma: Toward Generic Image Assessment with Mixture of Assessment Experts
Authors: Hantao Zhou, Rui Yang, Longxiang Tang, Guanyi Qin, Yan Zhang, Runze Hu, Xiu Li,
Abstract summary: textbfGamma, a textbfGeneric imtextbfAge assesstextbfMent model, can effectively assess images from diverse scenes through mixed-dataset training.<n>Our Gamma model is trained and evaluated on 12 datasets spanning 6 image assessment scenarios.
Score: 23.48816491333345
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image assessment aims to evaluate the quality and aesthetics of images and has been applied across various scenarios, such as natural and AIGC scenes. Existing methods mostly address these sub-tasks or scenes individually. While some works attempt to develop unified image assessment models, they have struggled to achieve satisfactory performance or cover a broad spectrum of assessment scenarios. In this paper, we present \textbf{Gamma}, a \textbf{G}eneric im\textbf{A}ge assess\textbf{M}ent model using \textbf{M}ixture of \textbf{A}ssessment Experts, which can effectively assess images from diverse scenes through mixed-dataset training. Achieving unified training in image assessment presents significant challenges due to annotation biases across different datasets. To address this issue, we first propose a Mixture of Assessment Experts (MoAE) module, which employs shared and adaptive experts to dynamically learn common and specific knowledge for different datasets, respectively. In addition, we introduce a Scene-based Differential Prompt (SDP) strategy, which uses scene-specific prompts to provide prior knowledge and guidance during the learning process, further boosting adaptation for various scenes. Our Gamma model is trained and evaluated on 12 datasets spanning 6 image assessment scenarios. Extensive experiments show that our unified Gamma outperforms other state-of-the-art mixed-training methods by significant margins while covering more scenes. Code: https://github.com/zht8506/Gamma.

Related papers

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models [88.398085358514]
DICE is a model designed to detect localized differences between the original and the edited image.<n>It is trained using a strategy that leverages self-supervision, distillation from inpainting networks, and full supervision.<n>We demonstrate that DICE effectively identifies coherent edits, effectively evaluating images generated by different editing models with a strong correlation with human judgment.
arXiv Detail & Related papers (2025-05-26T18:00:10Z)
Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias [52.590072198551944]
The aim of image personalization is to create images based on a user-provided subject.<n>Current methods face challenges in ensuring fidelity to the text prompt.<n>We introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images.
arXiv Detail & Related papers (2025-03-09T14:14:02Z)
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents [62.616106562146776]
We propose a textbfVisual-Centric textbfSelection approach via textbfAgents Collaboration (ViSA) Our approach consists of 1) an image information quantification method via visual agents collaboration to select images with rich visual information, and 2) a visual-centric instruction quality assessment method to select high-quality instruction data related to high-quality images.
arXiv Detail & Related papers (2025-02-27T09:37:30Z)
What Makes a Scene ? Scene Graph-based Evaluation and Feedback for Controllable Generation [29.42202665594218]
We introduce Scene-Bench, a comprehensive benchmark designed to evaluate and enhance the factual consistency in generating natural scenes. Scene-Bench comprises MegaSG, a large-scale dataset of one million images annotated with scene graphs, and SGScore, a novel evaluation metric. We develop a scene graph feedback pipeline that iteratively refines generated images by identifying and correcting discrepancies between the scene graph and the image.
arXiv Detail & Related papers (2024-11-23T03:40:25Z)
Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA) Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z)
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction [66.98008357232428]
We propose FineMatch, a new aspect-based fine-grained text and image matching benchmark. FineMatch focuses on text and image mismatch detection and correction. We show that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches.
arXiv Detail & Related papers (2024-04-23T03:42:14Z)
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z)
Diffusion Model-Based Image Editing: A Survey [46.244266782108234]
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks.<n>We provide an exhaustive overview of existing methods using diffusion models for image editing.<n>To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval.
arXiv Detail & Related papers (2024-02-27T14:07:09Z)
Domain Generalization for Mammographic Image Analysis with Contrastive Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities. A novel contrastive learning is developed to equip the deep learning models with better style generalization capability. The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z)
Multimodal Contrastive Training for Visual Representation Learning [45.94662252627284]
We develop an approach to learning visual representations that embraces multimodal data. Our method exploits intrinsic data properties within each modality and semantic information from cross-modal correlation simultaneously. By including multimodal training in a unified framework, our method can learn more powerful and generic visual features.
arXiv Detail & Related papers (2021-04-26T19:23:36Z)
D-LEMA: Deep Learning Ensembles from Multiple Annotations -- Application to Skin Lesion Segmentation [14.266037264648533]
Leveraging a collection of annotators' opinions for an image is an interesting way of estimating a gold standard. We propose an approach to handle annotators' disagreements when training a deep model.
arXiv Detail & Related papers (2020-12-14T01:51:22Z)
Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting. We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders. Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.