Investigation into using stochastic embedding representations for evaluating the trustworthiness of the Fréchet Inception Distance
- URL: http://arxiv.org/abs/2601.21979v1
- Date: Thu, 29 Jan 2026 16:56:01 GMT
- Title: Investigation into using stochastic embedding representations for evaluating the trustworthiness of the Fréchet Inception Distance
- Authors: Ciaran Bench, Vivek Desai, Carlijn Roozemond, Ruben van Engen, Spencer A. Thomas,
- Abstract summary: We use Monte Carlo dropout to compute the predictive variance in the Fréchet Inception Distance (FID)<n>We show that the magnitudes of the predictive variances considered exhibit varying degrees of correlation with the extent to which test inputs are out-of-distribution relative to its training data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Feature embeddings acquired from pretrained models are widely used in medical applications of deep learning to assess the characteristics of datasets; e.g. to determine the quality of synthetic, generated medical images. The Fréchet Inception Distance (FID) is one popular synthetic image quality metric that relies on the assumption that the characteristic features of the data can be detected and encoded by an InceptionV3 model pretrained on ImageNet1K (natural images). While it is widely known that this makes it less effective for applications involving medical images, the extent to which the metric fails to capture meaningful differences in image characteristics is not obviously known. Here, we use Monte Carlo dropout to compute the predictive variance in the FID as well as a supplemental estimate of the predictive variance in the feature embedding model's latent representations. We show that the magnitudes of the predictive variances considered exhibit varying degrees of correlation with the extent to which test inputs (ImageNet1K validation set augmented at various strengths, and other external datasets) are out-of-distribution relative to its training data, providing some insight into the effectiveness of their use as indicators of the trustworthiness of the FID.
Related papers
- A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation.<n> Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity.<n>This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z) - Quantifying the uncertainty of model-based synthetic image quality metrics [0.0]
Uncertainty quantification (UQ) is used to provide a measure of the trustworthiness of the feature embedding model and an FID-like metric called the Fr'echet Autoencoder Distance (FAED)<n>We express uncertainty as the predictive variance of the embeddings as well as the standard deviation of the computed FAED values.<n>We find that their magnitude correlates with the extent to which the inputs are out-of-distribution to the model's training data, providing some validation of its ability to assess the trustworthiness of the FAED.
arXiv Detail & Related papers (2025-04-04T17:41:58Z) - A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis [1.2274782635747272]
Fr'echet Inception Distance (FID), computed with an ImageNet pretrained Inception-v3 network, is widely used as a state-of-the-art evaluation metric for generative models.<n>In this paper, we examine cases from retinal imaging modalities, including color fundus photography and optical coherence tomography, where FID and its related metrics misalign with task-specific evaluation goals.
arXiv Detail & Related papers (2025-02-24T13:54:57Z) - Epistemic Uncertainty for Generated Image Detection [107.62647907393377]
We introduce a novel framework for AI-generated image detection through epistemic uncertainty, aiming to address critical security concerns in the era of generative models.<n>Our key insight stems from the observation that distributional discrepancies between training and testing data manifest distinctively in the epistemic uncertainty space of machine learning models.
arXiv Detail & Related papers (2024-12-08T11:32:25Z) - Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets [13.737058479403311]
We introduce a new perceptual metric tailored for medical images, FRD (Fr'echet Radiomic Distance)<n>We show that FRD is superior to other image distribution metrics for a range of medical imaging applications.<n> FRD offers additional benefits such as stability and computational efficiency at low sample sizes.
arXiv Detail & Related papers (2024-12-02T13:49:14Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Distributional Drift Detection in Medical Imaging with Sketching and Fine-Tuned Transformer [2.7552551107566137]
This paper presents an accurate and sensitive approach to detect distributional drift in CT-scan medical images.<n>We developed a robust baseline library model for real-time anomaly detection, allowing for efficient comparison of incoming images.<n>We fine-tuned a pre-trained Vision Transformer model to extract relevant features, using mammography as a case study.
arXiv Detail & Related papers (2024-08-15T23:46:37Z) - X-Fake: Juggling Utility Evaluation and Explanation of Simulated SAR Images [49.546627070454456]
The distribution inconsistency between real and simulated data is the main obstacle that influences the utility of simulated SAR images.
We propose a novel trustworthy utility evaluation framework with a counterfactual explanation for simulated SAR images for the first time, denoted as X-Fake.
The proposed framework is validated on four simulated SAR image datasets obtained from electromagnetic models and generative artificial intelligence approaches.
arXiv Detail & Related papers (2024-07-28T09:27:53Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Image Statistics Predict the Sensitivity of Perceptual Quality Metrics [44.077177515227554]
It remains unclear how this link is expressed in mathematical terms from image probability.<n>Here, we evaluate image probabilities using a generative model for natural images.<n>We analyse how probability-related factors can be combined to predict the sensitivity of state-of-the-art subjective image quality metrics.
arXiv Detail & Related papers (2023-03-17T10:38:27Z) - Visual Recognition with Deep Learning from Biased Image Datasets [6.10183951877597]
We show how biasing models can be applied to remedy problems in the context of visual recognition.
Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations.
We propose to use a low dimensional image representation, shared across the image databases.
arXiv Detail & Related papers (2021-09-06T10:56:58Z) - Improved Slice-wise Tumour Detection in Brain MRIs by Computing
Dissimilarities between Latent Representations [68.8204255655161]
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with unsupervised methods.
We have proposed a slice-wise semi-supervised method for tumour detection based on the computation of a dissimilarity function in the latent space of a Variational AutoEncoder.
We show that by training the models on higher resolution images and by improving the quality of the reconstructions, we obtain results which are comparable with different baselines.
arXiv Detail & Related papers (2020-07-24T14:02:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.