A Domain-specific Perceptual Metric via Contrastive Self-supervised
Representation: Applications on Natural and Medical Images
- URL: http://arxiv.org/abs/2212.01577v1
- Date: Sat, 3 Dec 2022 08:55:47 GMT
- Title: A Domain-specific Perceptual Metric via Contrastive Self-supervised
Representation: Applications on Natural and Medical Images
- Authors: Hongwei Bran Li, Chinmay Prabhakar, Suprosanna Shit, Johannes
Paetzold, Tamaz Amiranashvili, Jianguo Zhang, Daniel Rueckert, Juan Eugenio
Iglesias, Benedikt Wiestler and Bjoern Menze
- Abstract summary: Quantifying the perceptual similarity of two images is a long-standing problem in low-level computer vision.
Recent contrastive self-supervised representation (CSR) may come to the rescue.
- Score: 8.769705957207576
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Quantifying the perceptual similarity of two images is a long-standing
problem in low-level computer vision. The natural image domain commonly relies
on supervised learning, e.g., a pre-trained VGG, to obtain a latent
representation. However, due to domain shift, pre-trained models from the
natural image domain might not apply to other image domains, such as medical
imaging. Notably, in medical imaging, evaluating the perceptual similarity is
exclusively performed by specialists trained extensively in diverse medical
fields. Thus, medical imaging remains devoid of task-specific, objective
perceptual measures. This work answers the question: Is it necessary to rely on
supervised learning to obtain an effective representation that could measure
perceptual similarity, or is self-supervision sufficient? To understand whether
recent contrastive self-supervised representation (CSR) may come to the rescue,
we start with natural images and systematically evaluate CSR as a metric across
numerous contemporary architectures and tasks and compare them with existing
methods. We find that in the natural image domain, CSR behaves on par with the
supervised one on several perceptual tests as a metric, and in the medical
domain, CSR better quantifies perceptual similarity concerning the experts'
ratings. We also demonstrate that CSR can significantly improve image quality
in two image synthesis tasks. Finally, our extensive results suggest that
perceptuality is an emergent property of CSR, which can be adapted to many
image domains without requiring annotations.
Related papers
- Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z) - A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks [1.8563642867160601]
The creation of AI-generated images often involves refining the input prompt iteratively to achieve desired visual outcomes.
This study focuses on the relatively underexplored concept of image regeneration using AI.
We present a structured user study evaluating how iterative prompt refinement affects the similarity of regenerated images relative to their targets.
arXiv Detail & Related papers (2025-04-29T01:21:16Z) - FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics [66.14786900470158]
We propose FakeScope, an expert multimodal model (LMM) tailored for AI-generated image forensics.
FakeScope identifies AI-synthetic images with high accuracy and provides rich, interpretable, and query-driven forensic insights.
FakeScope achieves state-of-the-art performance in both closed-ended and open-ended forensic scenarios.
arXiv Detail & Related papers (2025-03-31T16:12:48Z) - RaD: A Metric for Medical Image Distribution Comparison in Out-of-Domain Detection and Other Applications [11.259711708037639]
Radiomic Feature Distance (RaD) is a new perceptual metric tailored for medical images.
We show that RaD is superior to other metrics for out-of-domain (OOD) detection in a variety of experiments.
RaD also offers additional benefits such as interpretability, as well as stability and computational efficiency at low sample sizes.
arXiv Detail & Related papers (2024-12-02T13:49:14Z) - When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Understanding differences in applying DETR to natural and medical images [16.200340490559338]
Transformer-based detectors have shown success in computer vision tasks with natural images.
Medical imaging data presents unique challenges such as extremely large image sizes, fewer and smaller regions of interest, and object classes which can be differentiated only through subtle differences.
This study evaluates the applicability of these transformer-based design choices when applied to a screening mammography dataset.
arXiv Detail & Related papers (2024-05-27T22:06:42Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Local Spatiotemporal Representation Learning for
Longitudinally-consistent Neuroimage Analysis [7.568469725821069]
This paper presents a local and multi-scaletemporal representation learning method for image-to-image architectures trained on longitudinal images.
During finetuning, it proposes a surprisingly simple self-supervised segmentation consistency regularization to exploit intrasubject correlation.
These improvements are demonstrated across both longitudinal neurodegenerative adult and developing infant brain MRI and yield both higher performance and longitudinal consistency.
arXiv Detail & Related papers (2022-06-09T05:17:00Z) - A Principled Design of Image Representation: Towards Forensic Tasks [75.40968680537544]
We investigate the forensic-oriented image representation as a distinct problem, from the perspectives of theory, implementation, and application.
At the theoretical level, we propose a new representation framework for forensics, called Dense Invariant Representation (DIR), which is characterized by stable description with mathematical guarantees.
We demonstrate the above arguments on the dense-domain pattern detection and matching experiments, providing comparison results with state-of-the-art descriptors.
arXiv Detail & Related papers (2022-03-02T07:46:52Z) - Lesion-based Contrastive Learning for Diabetic Retinopathy Grading from
Fundus Images [2.498907460918493]
We propose a self-supervised framework, namely lesion-based contrastive learning for automated diabetic retinopathy grading.
Our proposed framework performs outstandingly on DR grading in terms of both linear evaluation and transfer capacity evaluation.
arXiv Detail & Related papers (2021-07-17T16:30:30Z) - Contrastive Learning of Medical Visual Representations from Paired
Images and Text [38.91117443316013]
We propose ConVIRT, an unsupervised strategy to learn medical visual representations by exploiting naturally occurring descriptive paired text.
Our new method of pretraining medical image encoders with the paired text data via a bidirectional contrastive objective between the two modalities is domain-agnostic, and requires no additional expert input.
arXiv Detail & Related papers (2020-10-02T02:10:18Z) - Unsupervised Bidirectional Cross-Modality Adaptation via Deeply
Synergistic Image and Feature Alignment for Medical Image Segmentation [73.84166499988443]
We present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA)
Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives.
Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images.
arXiv Detail & Related papers (2020-02-06T13:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.