Related papers: Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between Images

Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between Images

URL: http://arxiv.org/abs/2512.03701v1
Date: Wed, 03 Dec 2025 11:48:59 GMT
Title: Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between Images
Authors: Paula Seidler, Neill D. F. Campbell, Ivor J A Simpson,
Abstract summary: Perceptual similarity scores that align with human vision are critical for both training and evaluating computer vision models.<n>We introduce the Structured Uncertainty Similarity Score (SUSS); it models each image through a set of perceptual components.<n>The final score is a weighted sum of component log-probabilities with weights learned from human perceptual datasets.
Score: 3.1296300934639327
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Perceptual similarity scores that align with human vision are critical for both training and evaluating computer vision models. Deep perceptual losses, such as LPIPS, achieve good alignment but rely on complex, highly non-linear discriminative features with unknown invariances, while hand-crafted measures like SSIM are interpretable but miss key perceptual properties. We introduce the Structured Uncertainty Similarity Score (SUSS); it models each image through a set of perceptual components, each represented by a structured multivariate Normal distribution. These are trained in a generative, self-supervised manner to assign high likelihood to human-imperceptible augmentations. The final score is a weighted sum of component log-probabilities with weights learned from human perceptual datasets. Unlike feature-based methods, SUSS learns image-specific linear transformations of residuals in pixel space, enabling transparent inspection through decorrelated residuals and sampling. SUSS aligns closely with human perceptual judgments, shows strong perceptual calibration across diverse distortion types, and provides localized, interpretable explanations of its similarity assessments. We further demonstrate stable optimization behavior and competitive performance when using SUSS as a perceptual loss for downstream imaging tasks.

Related papers

Context-Enriched Contrastive Loss: Enhancing Presentation of Inherent Sample Connections in Contrastive Learning Framework [5.906578607951289]
In contrastive learning, the contrastive loss function plays a pivotal role in discerning similarities between samples through techniques such as rotation or cropping.<n>This paper proposes a context-enriched contrastive loss function that concurrently improves learning effectiveness and addresses the information distortion by encompassing two convergence targets.
arXiv Detail & Related papers (2025-12-01T19:26:19Z)
Hybrid Image Resolution Quality Metric (HIRQM):A Comprehensive Perceptual Image Quality Assessment Framework [0.0]
We propose the Hybrid Image Resolution Quality Metric (HIRQM) to integrate statistical, multi-scale, and deep learning methods for a comprehensive quality evaluation.<n>A dynamic weighting mechanism adapts component contributions based on image characteristics like brightness and variance, enhancing flexibility across distortion types.<n> evaluated on TID2013 and LIVE datasets, HIRQM Pearson and Spearman correlations of 0.92 and 0.90, outperforming traditional metrics.
arXiv Detail & Related papers (2025-05-04T06:14:10Z)
Unsupervised Representation Learning by Balanced Self Attention Matching [2.3020018305241337]
We present a self-supervised method for embedding image features called BAM. We obtain rich representations and avoid feature collapse by minimizing a loss that matches these distributions to their globally balanced and entropy regularized version. We show competitive performance with leading methods on both semi-supervised and transfer-learning benchmarks.
arXiv Detail & Related papers (2024-08-04T12:52:44Z)
The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric [6.1693649058046764]
We show how perceptual embeddings of the visual system can be constructed at inference-time with no training data or deep neural network features. Our perceptual embeddings are solutions to a weighted least squares (WLS) problem, defined at the pixel-level, and solved at inference-time.
arXiv Detail & Related papers (2023-10-06T19:02:00Z)
Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search [66.95134080902717]
We propose a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL) We introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features. Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-25T04:48:11Z)
Causal Transportability for Visual Recognition [70.13627281087325]
We show that standard classifiers fail because the association between images and labels is not transportable across settings. We then show that the causal effect, which severs all sources of confounding, remains invariant across domains. This motivates us to develop an algorithm to estimate the causal effect for image classification.
arXiv Detail & Related papers (2022-04-26T15:02:11Z)
Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems. We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN) We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z)
Projected Distribution Loss for Image Enhancement [15.297569497776374]
We show that aggregating 1D-Wasserstein distances between CNN activations is more reliable than the existing approaches. In imaging applications such as denoising, super-resolution, demosaicing, deblurring and JPEG artifact removal, the proposed learning loss outperforms the current state-of-the-art on reference-based perceptual losses.
arXiv Detail & Related papers (2020-12-16T22:13:03Z)
Uncertainty-Aware Few-Shot Image Classification [118.72423376789062]
Few-shot image classification learns to recognize new categories from limited labelled data. We propose Uncertainty-Aware Few-Shot framework for image classification.
arXiv Detail & Related papers (2020-10-09T12:26:27Z)
Adversarial Semantic Data Augmentation for Human Pose Estimation [96.75411357541438]
We propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity. We also propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamiclly predict tailored pasting configuration. State-of-the-art results are achieved on challenging benchmarks.
arXiv Detail & Related papers (2020-08-03T07:56:04Z)
Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model. The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.