Related papers: CrossScore: Towards Multi-View Image Evaluation and Scoring

CrossScore: Towards Multi-View Image Evaluation and Scoring

URL: http://arxiv.org/abs/2404.14409v4
Date: Tue, 23 Jul 2024 07:47:35 GMT
Title: CrossScore: Towards Multi-View Image Evaluation and Scoring
Authors: Zirui Wang, Wenjing Bian, Victor Adrian Prisacariu,
Abstract summary: Cross-reference image quality assessment method fills the gap in the image assessment landscape. Our method enables accurate image quality assessment without requiring ground truth references.
Score: 24.853612457257697
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce a novel cross-reference image quality assessment method that effectively fills the gap in the image assessment landscape, complementing the array of established evaluation schemes -- ranging from full-reference metrics like SSIM, no-reference metrics such as NIQE, to general-reference metrics including FID, and Multi-modal-reference metrics, e.g., CLIPScore. Utilising a neural network with the cross-attention mechanism and a unique data collection pipeline from NVS optimisation, our method enables accurate image quality assessment without requiring ground truth references. By comparing a query image against multiple views of the same scene, our method addresses the limitations of existing metrics in novel view synthesis (NVS) and similar tasks where direct reference images are unavailable. Experimental results show that our method is closely correlated to the full-reference metric SSIM, while not requiring ground truth references.

Related papers

Evaluating Image Caption via Cycle-consistent Text-to-Image Generation [24.455344211552692]
We propose CAMScore, a reference-free automatic evaluation metric for image captioning models. To circumvent the aforementioned modality gap, CAMScore utilizes a text-to-image model to generate images from captions and subsequently evaluates these generated images against the original images. Experiment results show that CAMScore achieves a superior correlation with human judgments compared to existing reference-based and reference-free metrics.
arXiv Detail & Related papers (2025-01-07T06:35:34Z)
Attention Down-Sampling Transformer, Relative Ranking and Self-Consistency for Blind Image Quality Assessment [17.04649536069553]
No-reference image quality assessment is a challenging domain that addresses estimating image quality without the original reference. We introduce an improved mechanism to extract local and non-local information from images via different transformer encoders and CNNs. A self-consistency approach to self-supervision is presented, explicitly addressing the degradation of no-reference image quality assessment (NR-IQA) models.
arXiv Detail & Related papers (2024-09-11T09:08:43Z)
HICEScore: A Hierarchical Metric for Image Captioning Evaluation [10.88292081473071]
We propose a novel reference-free metric for image captioning evaluation, dubbed Hierarchical Image Captioning Evaluation Score (HICE-S) By detecting local visual regions and textual phrases, HICE-S builds an interpretable hierarchical scoring mechanism. Our proposed metric achieves the SOTA performance on several benchmarks, outperforming existing reference-free metrics.
arXiv Detail & Related papers (2024-07-26T08:24:30Z)
MB-RACS: Measurement-Bounds-based Rate-Adaptive Image Compressed Sensing Network [65.1004435124796]
We propose a Measurement-Bounds-based Rate-Adaptive Image Compressed Sensing Network (MB-RACS) framework. Our experiments demonstrate that the proposed MB-RACS method surpasses current leading methods.
arXiv Detail & Related papers (2024-01-19T04:40:20Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation [47.40949434032489]
We propose a new contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S) PAC-S unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data. Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos.
arXiv Detail & Related papers (2023-03-21T18:03:14Z)
Introspective Deep Metric Learning for Image Retrieval [80.29866561553483]
We argue that a good similarity model should consider the semantic discrepancies with caution to better deal with ambiguous images for more robust training. We propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and attains state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.
arXiv Detail & Related papers (2022-05-09T17:51:44Z)
Assessing a Single Image in Reference-Guided Image Synthesis [14.936460594115953]
We propose a learning-based framework, Reference-guided Image Synthesis Assessment (RISA) to quantitatively evaluate the quality of a single generated image. As this annotation is too coarse as a supervision signal, we introduce two techniques: 1) a pixel-wise scheme to refine the coarse labels, and 2) multiple binary classifiers to replace a na"ive regressor. RISA is highly consistent with human preference and transfers well across models.
arXiv Detail & Related papers (2021-12-08T08:22:14Z)
Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z)
CLIPScore: A Reference-free Evaluation Metric for Image Captioning [44.14502257230038]
We show that CLIP, a cross-modal model pretrained on 400M image+caption pairs from the web, can be used for robust automatic evaluation of image captioning without the need for references. Experiments spanning several corpora demonstrate that our new reference-free metric, CLIPScore, achieves the highest correlation with human judgements. We also present a reference-augmented version, RefCLIPScore, which achieves even higher correlation.
arXiv Detail & Related papers (2021-04-18T05:00:29Z)
Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE) Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.