Related papers: A Domain-specific Perceptual Metric via Contrastive Self-supervised Representation: Applications on Natural and Medical Images

A Domain-specific Perceptual Metric via Contrastive Self-supervised Representation: Applications on Natural and Medical Images

URL: http://arxiv.org/abs/2212.01577v1
Date: Sat, 3 Dec 2022 08:55:47 GMT
Title: A Domain-specific Perceptual Metric via Contrastive Self-supervised Representation: Applications on Natural and Medical Images
Authors: Hongwei Bran Li, Chinmay Prabhakar, Suprosanna Shit, Johannes Paetzold, Tamaz Amiranashvili, Jianguo Zhang, Daniel Rueckert, Juan Eugenio Iglesias, Benedikt Wiestler and Bjoern Menze
Abstract summary: Quantifying the perceptual similarity of two images is a long-standing problem in low-level computer vision. Recent contrastive self-supervised representation (CSR) may come to the rescue.
Score: 8.769705957207576
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Quantifying the perceptual similarity of two images is a long-standing problem in low-level computer vision. The natural image domain commonly relies on supervised learning, e.g., a pre-trained VGG, to obtain a latent representation. However, due to domain shift, pre-trained models from the natural image domain might not apply to other image domains, such as medical imaging. Notably, in medical imaging, evaluating the perceptual similarity is exclusively performed by specialists trained extensively in diverse medical fields. Thus, medical imaging remains devoid of task-specific, objective perceptual measures. This work answers the question: Is it necessary to rely on supervised learning to obtain an effective representation that could measure perceptual similarity, or is self-supervision sufficient? To understand whether recent contrastive self-supervised representation (CSR) may come to the rescue, we start with natural images and systematically evaluate CSR as a metric across numerous contemporary architectures and tasks and compare them with existing methods. We find that in the natural image domain, CSR behaves on par with the supervised one on several perceptual tests as a metric, and in the medical domain, CSR better quantifies perceptual similarity concerning the experts' ratings. We also demonstrate that CSR can significantly improve image quality in two image synthesis tasks. Finally, our extensive results suggest that perceptuality is an emergent property of CSR, which can be adapted to many image domains without requiring annotations.

Related papers

Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z)
A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks [1.8563642867160601]
The creation of AI-generated images often involves refining the input prompt iteratively to achieve desired visual outcomes. This study focuses on the relatively underexplored concept of image regeneration using AI. We present a structured user study evaluating how iterative prompt refinement affects the similarity of regenerated images relative to their targets.
arXiv Detail & Related papers (2025-04-29T01:21:16Z)
FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics [66.14786900470158]
We propose FakeScope, an expert multimodal model (LMM) tailored for AI-generated image forensics. FakeScope identifies AI-synthetic images with high accuracy and provides rich, interpretable, and query-driven forensic insights. FakeScope achieves state-of-the-art performance in both closed-ended and open-ended forensic scenarios.
arXiv Detail & Related papers (2025-03-31T16:12:48Z)
RaD: A Metric for Medical Image Distribution Comparison in Out-of-Domain Detection and Other Applications [11.259711708037639]
Radiomic Feature Distance (RaD) is a new perceptual metric tailored for medical images. We show that RaD is superior to other metrics for out-of-domain (OOD) detection in a variety of experiments. RaD also offers additional benefits such as interpretability, as well as stability and computational efficiency at low sample sizes.
arXiv Detail & Related papers (2024-12-02T13:49:14Z)
When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability. We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks. Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z)
Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework. Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z)
Understanding differences in applying DETR to natural and medical images [16.200340490559338]
Transformer-based detectors have shown success in computer vision tasks with natural images. Medical imaging data presents unique challenges such as extremely large image sizes, fewer and smaller regions of interest, and object classes which can be differentiated only through subtle differences. This study evaluates the applicability of these transformer-based design choices when applied to a screening mammography dataset.
arXiv Detail & Related papers (2024-05-27T22:06:42Z)
A domain adaptive deep learning solution for scanpath prediction of paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings. We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans. The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z)
Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks. In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics. Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z)
Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis [7.568469725821069]
This paper presents a local and multi-scaletemporal representation learning method for image-to-image architectures trained on longitudinal images. During finetuning, it proposes a surprisingly simple self-supervised segmentation consistency regularization to exploit intrasubject correlation. These improvements are demonstrated across both longitudinal neurodegenerative adult and developing infant brain MRI and yield both higher performance and longitudinal consistency.
arXiv Detail & Related papers (2022-06-09T05:17:00Z)
A Principled Design of Image Representation: Towards Forensic Tasks [75.40968680537544]
We investigate the forensic-oriented image representation as a distinct problem, from the perspectives of theory, implementation, and application. At the theoretical level, we propose a new representation framework for forensics, called Dense Invariant Representation (DIR), which is characterized by stable description with mathematical guarantees. We demonstrate the above arguments on the dense-domain pattern detection and matching experiments, providing comparison results with state-of-the-art descriptors.
arXiv Detail & Related papers (2022-03-02T07:46:52Z)
Lesion-based Contrastive Learning for Diabetic Retinopathy Grading from Fundus Images [2.498907460918493]
We propose a self-supervised framework, namely lesion-based contrastive learning for automated diabetic retinopathy grading. Our proposed framework performs outstandingly on DR grading in terms of both linear evaluation and transfer capacity evaluation.
arXiv Detail & Related papers (2021-07-17T16:30:30Z)
Contrastive Learning of Medical Visual Representations from Paired Images and Text [38.91117443316013]
We propose ConVIRT, an unsupervised strategy to learn medical visual representations by exploiting naturally occurring descriptive paired text. Our new method of pretraining medical image encoders with the paired text data via a bidirectional contrastive objective between the two modalities is domain-agnostic, and requires no additional expert input.
arXiv Detail & Related papers (2020-10-02T02:10:18Z)
Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation [73.84166499988443]
We present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA) Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives. Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images.
arXiv Detail & Related papers (2020-02-06T13:49:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.